This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'find' | |
argv = ARGV.empty? ? %w{.} : ARGV | |
file_counts = Hash.new(0) | |
files = Array.new | |
Find.find(*argv) do |fullname| | |
# looking for image files only | |
next unless fullname =~ /\.(JPEG|JPG|GIF|MOV)$/i | |
file = File.basename(fullname).downcase | |
dir = File.dirname(fullname) | |
files.push([file,dir]) | |
file_counts[file.downcase] += 1 | |
end | |
file_counts.each { |file, count| | |
if count > 1 then | |
# print the number of occurences and the file name | |
printf("%5d %s\n", count, file) | |
# since assoc only returns the first occurence, we print it | |
# then delete it so we can find the next occurence | |
while (a = files.assoc(file)) | |
# print each directory name | |
printf("\t %s\n", a[1]) | |
files.delete(a) | |
end | |
end | |
} |
To call this script, you can pass in a list of directories, or it will default to the current directory.
2 comments:
This little duplicate finding script is pretty neat. I ran it on a Windows XP machine from my C: drive. I copied the script from your blog and saved it on my machine as findDupFiles.rb and ran it. I also redirected the output to a file instead of having the output scroll through my command window.
Example:
prompt>ruby findDupFiles.rb c:\ > duplicates.txt
I was surprised at just how many duplicated image names there were on my system. Just as a guide, there were 8,740 lines in the text file that was generated with the results!! Yikes, that is a lot of duplicate filenames.
Tip: One word of caution to others that may find this tool helpful though. Although a filename may be duplicated, it does not necessarily mean that the file in separate locations is the same file. I suggest further physical inspection before removing or deleting etc. any of your duplicates.
Keep up the good work!
When I ran it, my output file was 15000+ lines long. I don't know if this will really help me organize my files, but it was interesting nonetheless. I have a lot of work to do!
Post a Comment