Getting more entropy for Subversion on FreeBSD
I was trying to create a new Subversion repository today and noticed
to my dismay that it would hang during create. Totally
high-centered. My only recourse was kill -9. It was bad.
“Why is this?” I thought. I googled a little and found this entry in the Subversion FAQ. In a nutshell I have too little entropy (sources of randomness) and that I should configure the system to get more entropy from interrupts.
It recommended checking into rndcontrol about this. The rndcontrol
manpage was easy enough:
SYNOPSIS
rndcontrol [-q] [-s irq_no] [-c irq_no]
DESCRIPTION
The rndcontrol command is used to set which interrupts are used to
help randomise the ``pool of entropy'' maintained by the kernel.
The /dev/random and /dev/urandom devices are the user interface
to this source of randomness. Any changes take effect
immediately.
The following command line options are supported:
-q Turn off all output except errors.
-s n Allow IRQ n to be used as a source of randomness. This
option may be repeated for more than one IRQ.
-c n Stop IRQ n from being used as a source of randomness.
This option may be repeated for more than one IRQ.
The default is to have no IRQ's being used.
Ok. So I need some IRQs I can get some entropy from:
# /sbin/dmesg | grep -i irq
IOAPIC #0 intpin 2 -> irq 0
IOAPIC #0 intpin 19 -> irq 2
IOAPIC #0 intpin 21 -> irq 5
IOAPIC #0 intpin 20 -> irq 9
asr0: <Adaptec Caching SCSI RAID> mem 0xfc000000-0xfdffffff irq 9
at device 4.1 on pci2
ahc0: <Adaptec aic7896/97 Ultra2 SCSI adapter> port 0x2000-0x20ff
mem 0xf4100000-0xf4100fff irq 2 at device 12.0 on pci0
ahc1: <Adaptec aic7896/97 Ultra2 SCSI adapter> port 0x2400-0x24ff
mem 0xf4101000-0xf4101fff irq 2 at device 12.1 on pci0
fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0x2800-0x283f mem
0xf4000000-0xf40fffff,0xf4102000-0xf4102fff irq 5 at device 14.0
on pci0
pci0: <Intel 82371AB/EB (PIIX4) USB controller> at 18.2 irq 5
sio0 at port 0x3f8-0x3ff irq 4 flags 0x30 on isa0
sio1 at port 0x2f8-0x2ff irq 3 on isa0
Ah, IRQ 9, 2, and 5 look great: hard drives, RAID adapters, network interfaces are all great entropy sources. Let’s change our random device now:
# rndcontrol -s 9 -s 2 -s 5
rndcontrol: setting irq 9
rndcontrol: setting irq 2
rndcontrol: setting irq 5
rndcontrol: interrupts in use: 2 5 9
(Meanwhile, in another terminal):
$ svnadmin create SVN_REPO
Wow, that was fast… er, I guess that’s what it’s like under normal conditions. Oh well. Better put things back the way they were:
# rndcontrol -c 9 -c 2 -c 5
rndcontrol: clearing irq 9
rndcontrol: clearing irq 2
rndcontrol: clearing irq 5
rndcontrol: interrupts in use:
That’s it.
subtract - delete matching lines from text files
Ever write code that you’re certain has been done a million times but you just don’t know what phrases to type into google to find it? Well, here is a little text utility that fits that criteria.
I call it subtract. It subtracts matching lines from text files (as
seen in title!). It uses grep output format, so I use it like this:
egrep -Hf patterns.txt *.txt | subtract
or like this:
grep -H 'line I want to remove' some_file.txt | subtract
grep output looks like this:
filename:line
filename:line
filename:line
So subtract takes these kinds of lines, opens the file filename
and looks for lines in it matching line, then deletes those lines
from the file. Pretty basic.
Updated: Tue Jul 29 11:38:02 MDT 2008
The mysterious Unix find command has some interesting things about
it. For one, the order of the command-line parameters is
important—earlier commands affect later commands. The ramifications
of this finally became clear to me after using find for 12 years.
So here is this and a few things I’ve learned over the years to make
find go faster:
find has command-line arguments which are called “expressions” or
“primaries”. These expressions are tests or rules that are applied to
each file as find crawls the file system, and the results of each
test are used to determine if find should proceed.
“Files” (and we really mean any file system entry: directories, devices, links, etc.) not matching the criteria are skipped.
find . -type f -name "joe*.jpg" -print
This example will find all regular files (not directories) with the name “joe(something).jpg”:
joe.jpg
joeschmoe.jpg
joe is awesome.jpg
find . -type d -path "*/lib/*" -print
This will find all directories that have /lib/ as part of their
pathname (but not lib itself).
find has logical operators (“and” and “or”) and group operators
(parentheses) which allow you to group expressions.
Group expressions with parentheses, and use logical operators to include or exclude files.
find . ( -name "*.php" -or -name "*.phtml" ) -type f -print
This will find all regular files that end in “.php” or “.phtml”. The
parentheses group the two -name expressions, so that if either one
matches, the entire expression (between the parentheses) is true (and
find then continues on to evaluate the next expression).
Depending on the shell you are using, you may need to escape the parentheses so that the shell doesn’t see them first:
bash $ find . \( -name "*.php" -or -name "*.phtml" \) -type f -print
find . -name "*.[Pp][Hh][Pp]" -type f -print
You can use character classes to match filenames that may have varying case:
foo.PHP
foo.php
foo.Php
foo.pHp
find evaluates its expressions in the order they appear. This has
important ramifications. When find runs, it works like this:
find . -type f -name "*.jpg" -print
-type f)? If no, skip this file*.jpg? If no, skip this file-print)This repeats for all files rooted in the current directory
(.). You’ll notice that find operates on its expressions with an
implicit ‘and’ operator between them, and uses shortcut logic. In
other words, for ‘and’ conditions, the second expression will not be
evaluated if the first one is false. For ‘or’ conditions, subsequent
expressions will not be evalutated if the first one is found true.
Skip as much as possible as early as possible by putting the ‘and’ expressions most likely to fail (or ‘or’ expressions most likely to succeed) near the beginning of the expression list.
Avoid entire directory hierarchies with -prune:
find share ( -path "share/doc" -prune ) -o -type f -print
This will significantly reduce the number of comparisons we have to
make while find is traversing the file tree.
With filesystem read caching, it becomes very difficult to benchmark disk I/O easily. The best way (that I’ve heard of) is to unmount and remount the partition, which clears the read cache (but setting up that test environment takes more time than I have).
Instead we’ll count system calls, which should give us a rough idea of
speed, especially when comparing two different find calls to achieve
the same thing.
What to look for:
Look at the lstat64 syscall, as well as the calls to chdir,
open, close, and fstat64.
First we’ll look at this call to find. Notice that the -type f
test occurs before the -path test:
find share -type f ! -path "share/doc/*" -print
An strace helps us see what’s happening:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
75.38 0.115423 2 53671 lstat64
9.56 0.014634 3 4396 getdents64
4.83 0.007399 2 4115 chdir
4.22 0.006467 3 2064 1 open
1.83 0.002797 9 324 write
1.71 0.002614 1 2062 close
1.21 0.001850 1 2063 fstat64
1.12 0.001716 1 2058 fcntl64
------ ----------- ----------- --------- --------- ----------------
100.00 0.153119 70776 1 total
(I’ve omitted all of the calls that were called only once and took
less than 1% of time time). We’ve got 53671 lstat calls and 2k to 4k
other common syscalls.
Next we’ll look at putting the -type f test after we’ve checked
the path:
find share ! -path "share/doc/*" -type f -print
Here’s the strace output:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
72.91 0.100304 2 43210 lstat64
10.88 0.014970 3 4396 getdents64
5.39 0.007410 2 4115 chdir
3.41 0.004694 2 2064 1 open
2.24 0.003076 9 324 write
2.05 0.002817 1 2062 close
1.59 0.002184 1 2063 fstat64
1.40 0.001922 1 2058 fcntl64
------ ----------- ----------- --------- --------- ----------------
100.00 0.137577 60315 1 total
Notice that we’re making 10k fewer calls to lstat64 than before, and
everything else is roughly equal. You begin to see the effect of
ordering your expressions!
Finally, we’ll use find’s -prune expression to simply avoid an
entire hierarchy:
find share ( -path "share/doc" -prune ) -o -type f -print
And the strace output:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
78.35 0.092499 2 39040 lstat64
8.04 0.009487 4 2602 getdents64
3.50 0.004130 2 2361 chdir
3.01 0.003558 3 1181 fcntl64
2.55 0.003011 9 324 write
2.24 0.002642 2 1187 1 open
1.23 0.001454 1 1185 close
0.91 0.001074 1 1186 fstat64
------ ----------- ----------- --------- --------- ----------------
100.00 0.118052 49089 1 total
Now we’re a few thousand calls to lstat64 less than before, and half
as many calls to chdir, open, fcntl, close, and fstat64, for
a total of over 20k fewer syscalls.
You can see the effect -prune as well as the ordering of expressions
has on execution speed.
find examines each entry in a directory hierarchy; if the entry (be
it a regular file, symlink, directory, device, or whatever) matches
the given criteria, it will be printed. If no criteria are given, it’s
an automatic match (and will be printed).
find evaluates its expressions in the order you specify them; by
carefully choosing the order of the expressions, you can shave tons of
time off the cost of the search and get your wanted results quicker.
As is true with most programming endeavours, a little investment up
front in crafting the expressions will yield better results later. If
you’re only running a find operation once, maybe you don’t want to
take too much time fussing over saving a few syscalls, but if you will
be running find frequently (e.g., as part of a cron, or some other
regular occurance), do your disk a favor and let find skip as much
as possible.