App::Flo : Reclaim space used by duplicate files
Once again I have to handle the same situation : no more spaces and directories with a lot of duplicates.
That's not the first time I have to do it
(last time it was to remove all the duplicates .mp3 with different names when I merged my various 'Music' directories from different boxes/disks.), so I already have a small script waiting in my repos.
But this time I choose to explore another way, instead of removing duplicate files, I've tried the hardlink substitution way.
Using hardlinks is not always applicable but my perl5/perlbrew directory seemed a good candidate (read only duplicate data...).
And it was : after running my script on it, the size went from 765M to 670M, and all the test suites of the tested modules passed with all the Perl versions.
I first thought to release the script as a patch for perlbrew, but thinking more about it I realized that a need probably exists for a more generic tool.
That's why App::Phlo was created :-)
Not a killer module, but one that suit my needs, and that will enable me to test some ideas (multi digest algorithms, use with "unionfs like" fs, Perl dirs optimization...)
If you want to experiment with me, don't hesitate : Ideas, patches, comment are always welcome...
That's not the first time I have to do it
(last time it was to remove all the duplicates .mp3 with different names when I merged my various 'Music' directories from different boxes/disks.), so I already have a small script waiting in my repos.
But this time I choose to explore another way, instead of removing duplicate files, I've tried the hardlink substitution way.
Using hardlinks is not always applicable but my perl5/perlbrew directory seemed a good candidate (read only duplicate data...).
And it was : after running my script on it, the size went from 765M to 670M, and all the test suites of the tested modules passed with all the Perl versions.
I first thought to release the script as a patch for perlbrew, but thinking more about it I realized that a need probably exists for a more generic tool.
That's why App::Phlo was created :-)
Not a killer module, but one that suit my needs, and that will enable me to test some ideas (multi digest algorithms, use with "unionfs like" fs, Perl dirs optimization...)
If you want to experiment with me, don't hesitate : Ideas, patches, comment are always welcome...
Commentaires
Congratulations, 5 hours of programming saved you a 5 minutes research on the Web for prior art.
(currently there seems to have undeclared dependency or I goofed somewhere)
@Anonymous: Seems I was no clear enough about my objectives:
I was *not* willing to write a tool to handle (read delete) duplicates.
I wanted to:
A) Explore a new way of space optimization through hardlinking (especially against my perlbrew dirs)
B) Experiment various things (auto use of available digest algorithm)
C) Use Perl
None of the tools I've searched provided what I wanted.
I admit I considered using File::Find::Duplicates as a skeleton, but as I already had the File::Path recurse code from
previous experiment, I was quite reluctant to pay the dependency toll for a simple prototype.
(That and the fact that File::Find::Duplicate uses file size and MD5 only)
I might use it in the future, but allow me to evaluate the cost before.
And let me reassure you, it didn't took me hours to add the hardlinking code and options handling to an existing recursive traversal code.
But if you prefer I can also mention my Hubris and my Impatience as an excuse for my Lazyness (I haven't searched long enough)
;-)
http://www.file-utilities.com
:-)
Not free, closed source, Windows only (!!) so unlikely to be written in Perl...
Definitely not what I want for my *coding* experiments