[ 1 ]

Getting more entropy for Subversion on FreeBSD

I was trying to create a new Subversion repository today and noticed to my dismay that it would hang during create. Totally high-centered. My only recourse was kill -9. It was bad.

“Why is this?” I thought. I googled a little and found this entry in the Subversion FAQ. In a nutshell I have too little entropy (sources of randomness) and that I should configure the system to get more entropy from interrupts.

It recommended checking into rndcontrol about this. The rndcontrol manpage was easy enough:

 SYNOPSIS

 rndcontrol [-q] [-s irq_no] [-c irq_no]

 DESCRIPTION

 The rndcontrol command is used to set which interrupts are used to
 help randomise the ``pool of entropy'' maintained by the kernel.
 The /dev/random and /dev/urandom devices are the user interface
 to this source of randomness.  Any changes take effect
 immediately.

 The following command line options are supported:

 -q      Turn off all output except errors.

 -s n    Allow IRQ n to be used as a source of randomness.  This
         option may be repeated for more than one IRQ.

 -c n    Stop IRQ n from being used as a source of randomness.
         This option may be repeated for more than one IRQ.

 The default is to have no IRQ's being used.

Ok. So I need some IRQs I can get some entropy from:

# /sbin/dmesg | grep -i irq
IOAPIC #0 intpin 2 -> irq 0
IOAPIC #0 intpin 19 -> irq 2
IOAPIC #0 intpin 21 -> irq 5
IOAPIC #0 intpin 20 -> irq 9
asr0: <Adaptec Caching SCSI RAID> mem 0xfc000000-0xfdffffff irq 9
  at device 4.1 on pci2
ahc0: <Adaptec aic7896/97 Ultra2 SCSI adapter> port 0x2000-0x20ff
  mem 0xf4100000-0xf4100fff irq 2 at device 12.0 on pci0
ahc1: <Adaptec aic7896/97 Ultra2 SCSI adapter> port 0x2400-0x24ff
  mem 0xf4101000-0xf4101fff irq 2 at device 12.1 on pci0
fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0x2800-0x283f mem
  0xf4000000-0xf40fffff,0xf4102000-0xf4102fff irq 5 at device 14.0
  on pci0
pci0: <Intel 82371AB/EB (PIIX4) USB controller> at 18.2 irq 5
sio0 at port 0x3f8-0x3ff irq 4 flags 0x30 on isa0
sio1 at port 0x2f8-0x2ff irq 3 on isa0

Ah, IRQ 9, 2, and 5 look great: hard drives, RAID adapters, network interfaces are all great entropy sources. Let’s change our random device now:

# rndcontrol -s 9 -s 2 -s 5
rndcontrol: setting irq 9
rndcontrol: setting irq 2
rndcontrol: setting irq 5
rndcontrol: interrupts in use: 2 5 9

(Meanwhile, in another terminal):

$ svnadmin create SVN_REPO

Wow, that was fast… er, I guess that’s what it’s like under normal conditions. Oh well. Better put things back the way they were:

# rndcontrol -c 9 -c 2 -c 5
rndcontrol: clearing irq 9
rndcontrol: clearing irq 2
rndcontrol: clearing irq 5
rndcontrol: interrupts in use:

That’s it.

categories: /smarts, /work, /tech/freebsd, /tech/sysadmin
posted on Wed, 10 Jun 2009 at 14:09 | permanent link | view comments

subtract - delete matching lines from text files

Ever write code that you’re certain has been done a million times but you just don’t know what phrases to type into google to find it? Well, here is a little text utility that fits that criteria.

I call it subtract. It subtracts matching lines from text files (as seen in title!). It uses grep output format, so I use it like this:

egrep -Hf patterns.txt *.txt | subtract

or like this:

grep -H 'line I want to remove' some_file.txt | subtract

grep output looks like this:

filename:line
filename:line
filename:line

So subtract takes these kinds of lines, opens the file filename and looks for lines in it matching line, then deletes those lines from the file. Pretty basic.

download

Updated: Tue Jul 29 11:38:02 MDT 2008

categories: /tech/sysadmin
posted on Tue, 29 Jul 2008 at 11:38 | permanent link | view comments

Making ‘find’ go faster

The mysterious Unix find command has some interesting things about it. For one, the order of the command-line parameters is important—earlier commands affect later commands. The ramifications of this finally became clear to me after using find for 12 years.

So here is this and a few things I’ve learned over the years to make find go faster:

Idea Number 1

find has command-line arguments which are called “expressions” or “primaries”. These expressions are tests or rules that are applied to each file as find crawls the file system, and the results of each test are used to determine if find should proceed.

“Files” (and we really mean any file system entry: directories, devices, links, etc.) not matching the criteria are skipped.

Example 1a

find . -type f -name "joe*.jpg" -print

This example will find all regular files (not directories) with the name “joe(something).jpg”:

joe.jpg
joeschmoe.jpg
joe is awesome.jpg

Example 1b

find . -type d -path "*/lib/*" -print

This will find all directories that have /lib/ as part of their pathname (but not lib itself).


Idea Number 2

find has logical operators (“and” and “or”) and group operators (parentheses) which allow you to group expressions.

Corollary 2

Group expressions with parentheses, and use logical operators to include or exclude files.

Example 2a

find . ( -name "*.php" -or -name "*.phtml" ) -type f -print

This will find all regular files that end in “.php” or “.phtml”. The parentheses group the two -name expressions, so that if either one matches, the entire expression (between the parentheses) is true (and find then continues on to evaluate the next expression).

Note 2

Depending on the shell you are using, you may need to escape the parentheses so that the shell doesn’t see them first:

bash $ find . \( -name "*.php" -or -name "*.phtml" \) -type f -print

Example 2b

find . -name "*.[Pp][Hh][Pp]" -type f -print

You can use character classes to match filenames that may have varying case:

foo.PHP
foo.php
foo.Php
foo.pHp

Idea Number 3

find evaluates its expressions in the order they appear. This has important ramifications. When find runs, it works like this:

find . -type f -name "*.jpg" -print

This repeats for all files rooted in the current directory (.). You’ll notice that find operates on its expressions with an implicit ‘and’ operator between them, and uses shortcut logic. In other words, for ‘and’ conditions, the second expression will not be evaluated if the first one is false. For ‘or’ conditions, subsequent expressions will not be evalutated if the first one is found true.

Corollary 3

Skip as much as possible as early as possible by putting the ‘and’ expressions most likely to fail (or ‘or’ expressions most likely to succeed) near the beginning of the expression list.

Example 3a

Avoid entire directory hierarchies with -prune:

find share ( -path "share/doc" -prune ) -o -type f -print

This will significantly reduce the number of comparisons we have to make while find is traversing the file tree.


Benchmarks

With filesystem read caching, it becomes very difficult to benchmark disk I/O easily. The best way (that I’ve heard of) is to unmount and remount the partition, which clears the read cache (but setting up that test environment takes more time than I have).

Instead we’ll count system calls, which should give us a rough idea of speed, especially when comparing two different find calls to achieve the same thing.

What to look for:

Look at the lstat64 syscall, as well as the calls to chdir, open, close, and fstat64.

First we’ll look at this call to find. Notice that the -type f test occurs before the -path test:

find share -type f ! -path "share/doc/*" -print

An strace helps us see what’s happening:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 75.38    0.115423           2     53671           lstat64
  9.56    0.014634           3      4396           getdents64
  4.83    0.007399           2      4115           chdir
  4.22    0.006467           3      2064         1 open
  1.83    0.002797           9       324           write
  1.71    0.002614           1      2062           close
  1.21    0.001850           1      2063           fstat64
  1.12    0.001716           1      2058           fcntl64
------ ----------- ----------- --------- --------- ----------------
100.00    0.153119                 70776         1 total

(I’ve omitted all of the calls that were called only once and took less than 1% of time time). We’ve got 53671 lstat calls and 2k to 4k other common syscalls.

Next we’ll look at putting the -type f test after we’ve checked the path:

find share ! -path "share/doc/*" -type f -print

Here’s the strace output:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 72.91    0.100304           2     43210           lstat64
 10.88    0.014970           3      4396           getdents64
  5.39    0.007410           2      4115           chdir
  3.41    0.004694           2      2064         1 open
  2.24    0.003076           9       324           write
  2.05    0.002817           1      2062           close
  1.59    0.002184           1      2063           fstat64
  1.40    0.001922           1      2058           fcntl64
------ ----------- ----------- --------- --------- ----------------
100.00    0.137577                 60315         1 total

Notice that we’re making 10k fewer calls to lstat64 than before, and everything else is roughly equal. You begin to see the effect of ordering your expressions!

Finally, we’ll use find’s -prune expression to simply avoid an entire hierarchy:

find share ( -path "share/doc" -prune ) -o -type f -print

And the strace output:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 78.35    0.092499           2     39040           lstat64
  8.04    0.009487           4      2602           getdents64
  3.50    0.004130           2      2361           chdir
  3.01    0.003558           3      1181           fcntl64
  2.55    0.003011           9       324           write
  2.24    0.002642           2      1187         1 open
  1.23    0.001454           1      1185           close
  0.91    0.001074           1      1186           fstat64
------ ----------- ----------- --------- --------- ----------------
100.00    0.118052                 49089         1 total

Now we’re a few thousand calls to lstat64 less than before, and half as many calls to chdir, open, fcntl, close, and fstat64, for a total of over 20k fewer syscalls.

You can see the effect -prune as well as the ordering of expressions has on execution speed.


Summary

find examines each entry in a directory hierarchy; if the entry (be it a regular file, symlink, directory, device, or whatever) matches the given criteria, it will be printed. If no criteria are given, it’s an automatic match (and will be printed).

find evaluates its expressions in the order you specify them; by carefully choosing the order of the expressions, you can shave tons of time off the cost of the search and get your wanted results quicker.

As is true with most programming endeavours, a little investment up front in crafting the expressions will yield better results later. If you’re only running a find operation once, maybe you don’t want to take too much time fussing over saving a few syscalls, but if you will be running find frequently (e.g., as part of a cron, or some other regular occurance), do your disk a favor and let find skip as much as possible.

categories: /tech/sysadmin
posted on Wed, 24 Oct 2007 at 14:51 | permanent link | view comments

 
[ 1 ]