From: Sam Vilain (sam_at_vilain.net)
Date: Wed 14 Aug 2002 - 11:49:19 BST
Paul Sladen <vserver_at_paul.sladen.org> wrote:
> > On can not just unify all files found to be the same?
> It's not the unifying that's the problem, it's the finding which files are
> the same that is the hard part.
Not too hard, and there are lots of tricks you can use to avoid having
to stat every file and directory on the filesystem.
For instance, you can restrict the search so that only files in the
same place are compared for unification, eg /vservers/*/bin/ls. You
can make sure that you're checking the inode number for each directory
entry as you readdir() to avoid stat()'ing every inode every time.
You can compare other file stat information before actually comparing
the files. And you can use the tricks that `find' use to avoid having
to stat() every file in a directory to see if there are any
sub-directories off this one.
I've implemented an algorithm which is fairly efficient; though the
code probably still leaks :-)
http://sam.vilain.net/src/unify-dirs
You will also need a couple of the modules (ReadDir, Pod::Constants)
at http://sam.vilain.net/pm/ to use it.
Takes less than 5 minutes to scan and unify about 30 systems' /usr
directories on my test system.
> Disk space was the issue there. Disk space is cheap now.
This is a myth. The same rule applies that has always applied: Cheap,
Fast, Reliable: pick any two.
-- Sam Vilain, sam_at_vilain.net WWW: http://sam.vilain.net/ 7D74 2A09 B2D3 C30F F78E GPG: http://sam.vilain.net/sam.asc 278A A425 30A9 05B5 2F13A politician's most important ability is to foretell what will happen tomorrow and next month and next year - and to explain afterwards why it didn't happen. - anon.