A girl walks into a blog...: My first Ruby script: finding duplicate photo files

07 July 2008

My first Ruby script: finding duplicate photo files

I've started on the large task of organizing my digital photo files. My disorganization stems from a couple of things. First, when I began using a digital camera, I used iPhoto. I don't really use it now. But I've noticed that iPhoto makes some duplicate files. Second, I think I have duplicate files from backups and from migrating from old machines to new ones. Compounded with the fact that most of the files begin with DSCN or DCP, it is a mess. So, to help me get started, I thought I would write a script that would create a report of any duplicate files and their locations. Also, since I'm about one third of the way through the Everyday Scripting with Ruby by Brian Marick, I know just enough to be dangerous. After a couple of evenings of blundering my way through this, I finally have my first Ruby script.

To call this script, you can pass in a list of directories, or it will default to the current directory.

2 comments:

Anonymous said...: This little duplicate finding script is pretty neat. I ran it on a Windows XP machine from my C: drive. I copied the script from your blog and saved it on my machine as findDupFiles.rb and ran it. I also redirected the output to a file instead of having the output scroll through my command window.
Example:
prompt>ruby findDupFiles.rb c:\ > duplicates.txt

I was surprised at just how many duplicated image names there were on my system. Just as a guide, there were 8,740 lines in the text file that was generated with the results!! Yikes, that is a lot of duplicate filenames.

Tip: One word of caution to others that may find this tool helpful though. Although a filename may be duplicated, it does not necessarily mean that the file in separate locations is the same file. I suggest further physical inspection before removing or deleting etc. any of your duplicates.

Keep up the good work!; 07 July, 2008 22:24
Cynthia Sadler said...: When I ran it, my output file was 15000+ lines long. I don't know if this will really help me organize my files, but it was interesting nonetheless. I have a lot of work to do!; 08 July, 2008 08:12