Nothing is more disheartening than having to do the same tedious process more than once for no good reason. If you have years or even decades of data to wade through, you might find yourself in just that situation. “Wait, didn’t I just organize these? Oh, right. Those copies were on the other hard drive.” You could just toss out the second copy, but are you really sure the second set is a perfect duplicate? What if it has just a few updated files, like the latest versions of a couple of chapters of that book you’re writing?
If you’re one of the many readers currently sitting on a ton of data to wade through, fear not! One of the things computers happen to do extremely well is to identify files that are exactly the same. With modern advances in various machine learning techniques, they can even identify things that are almost the same—a wonderful capability for those of us with tons of photos.
No matter how much digital stuff you have to organize, you can streamline the process by explicitly removing duplicates first, during the filtering step of the Identify-Filter-Organize (IFO) process. And fortunately, there are tools for both Windows and Mac platforms to make this process fast and automatic, leaving you with only one copy of anything. Below, we’ll look briefly at three good options:
- Duplicate Cleaner Pro (Windows only)
- Duplicate File Finder (Mac only)
- PhotoSweeper (Mac only, specific for photo libraries)
Depending on your platform and needs, at least one of these will likely serve you well.
But Aren’t Duplicates a Good Thing?
Sort of. Backups are intentional duplicates, which you keep as insurance in the event of unplanned loss of important data on your main storage device. But as previously discussed, you definitely want at least one—and preferably multiple—reliable backup copies of your data.
But most of us have multiple copies of files that we didn’t intentionally create. The duplicate copies we want to get rid of in this case are the files, photos, or other items that are all within the main copy of your data. Taking examples from my personal data collection, these types of duplicates include:
- The photos that I synchronized to three different locations, resulting in different filenames that represent the same images
- The dozen hard drive file collections that I pulled together into a single triage location, many of which have partial sets of the same old backups
- Files that I’ve copied to sharing services like Dropbox that still exist in their original locations
With properly configured duplicate detection software, you can confidently delete unnecessary copies within your main data collection quickly and easily. The best part is that this even works whether or not that data is already organized!
Exact Duplicates vs. Similar Matches
Duplicate detection software solutions fall broadly into two categories:
- Tools that only find exact duplicates
- Tools that can also identify similar matches
Exact duplicate finders work great for a first pass, and maybe the only pass, depending on your data. They provide a high-confidence method to find identical files, generally based on a combination of the file size and file contents.
Most exact duplicate search tools do a neat trick called hashing. This involves applying a special algorithm to quickly summarize a file’s contents and generate a unique value that other files will never match (statistically speaking). Then, they compare these very short values against the same calculated values of thousands or even millions of other files on your devices very quickly. This eliminates the need to directly compare the entire contents of all possible matches. Files that have the same size and hash are basically guaranteed to be identical.
However, when dealing with photos and videos, you’ll often have similar matches that are not entirely identical. Maybe the photos contain the same subjects in only slightly different poses, such as a rapid series of images in a photoshoot. In this case, you can benefit from tools that can identify similar matches in addition to exact ones.
Similar match identification requires more sophisticated behavior. Judging whether two things are merely alike depends on how strict you want to be. Are two text files the same if they have only one changed letter? One changed word? Or one sentence, or a paragraph? And what about photos—how similar does the subject matter of two pictures need to be?
Files that get flagged as similar (but not identical) almost always need manual confirmation before you choose to delete any duplicates. But although it requires this extra step, the search tool saves time by finding possible duplicates for you. Otherwise, you’d have to find them by hand!
Duplicate Cleaner Pro from Digital Volcano
Duplicate Cleaner Pro from Digital Volcano is an application for Windows that helps you find and remove duplicate files, photos, videos, and even music. I have personally used it to great effect on my own gigantic collection of well over a million files, and it saves time and effort each time I open it.
Duplicate Cleaner Pro has many features but manages to present them in a way that doesn’t immediately overwhelm the user. More advanced functionality is ready to use just under the surface if you need it. It can identify both exact duplicates and similar matches, making it a comprehensive solution for all your duplicate filtering needs. The software also offers a preview feature, allowing you to review the files before deleting them.
The most useful and powerful functions of Duplicate Cleaner Pro require the paid version of the software. However, it’s available for a nominal one-time fee and occasionally goes on sale. Further, there is a free trial available for you to test how well everything works first.
Duplicate File Finder from Nektony
Duplicate File Finder from Nektony is a highly-reviewed tool for Macs. It’s similar in most of the important ways to the one above, except that it’s built for macOS. It has the same basic feature set, including the ability to find similar files, photos, and music. And, like the first, it has a free version and a Pro version with improved (and valuable) functionality. The upgrade to the Pro version is available on a recurring basis or a single “lifetime” payment, but even the lifetime cost is low—only half of Duplicate Cleaner Pro’s regular price.
If your main data collection lives on a Mac, I highly recommend this tool for duplicate identification and removal.
PhotoSweeper from Overmacs
PhotoSweeper is a fantastic application that is highly optimized for photo library clean-up. While it doesn’t handle other generic files like some of the others do, it more than makes up for this with specialized functions that make media collection organization a breeze. Not only will it find similar photos, but it can also specifically identify photos that have been edited in popular software and those that were taken in rapid succession to be treated as a grouped sequence.
Even though I work primarily in a Windows environment, I have a Mac at my disposal as well, and I actually purchased this app (only $10 at the time of writing!) specifically for photo collection clean-up. It’s a truly valuable addition to my data organization workflow.
Do One Thing
If you know or suspect that you have unnecessary duplicates in your data collection, download one of the tools above and use it for 15 minutes or so on a trial basis. See if it confirms your suspicions and provides a shortcut for deleting a sizeable chunk of what you don’t need.
Duplicate removal is only part of the digital clean-up process for most of us, but it’s one of my favorite steps. With tools like the ones we studied here, visible progress happens so easily that it almost feels like you’re cheating.