Deduplication is a hard problem: what constitutes a "dupe" is somewhat arguable.
My first approach just used metadata, but that doesn't aggregate the files stripped of their original metadata, like what you get from a Google Takeout, so I had to add image hashing as well. I actually generate three mean hashes in L*a*b color space for PhotoStructure (many hashes ignore color). I've also found that metadata needs to be normalized, including captured-at time, and even exposure metadata. It's a lot of whack-a-mole, especially add new cameras and image formats are released every year.
I described more about what I've written for PhotoStructure (which does deduplication for both videos and images) here: https://photostructure.com/faq/what-do-you-mean-by-deduplica... -- it might help you avoid some of the pitfalls I've had to overcome.
Thank you. It looks interesting. I was heading to much the same place of the order of precedence for matches, silently wondering if there was a class of bad edit which made the post-modified file bigger not smaller. Seems unlikely but not impossible.
A lot of my dupes are google dupes but across about 4 cameras with a mixture of original/compressed size.
A lot of my local copies had jhead run on them to "fix" time. So have modified EXIF
A small number have me playing with ITPC to try and auto-name things for tag matching.
Your program looks to be the one which understands the corner cases.
My first approach just used metadata, but that doesn't aggregate the files stripped of their original metadata, like what you get from a Google Takeout, so I had to add image hashing as well. I actually generate three mean hashes in L*a*b color space for PhotoStructure (many hashes ignore color). I've also found that metadata needs to be normalized, including captured-at time, and even exposure metadata. It's a lot of whack-a-mole, especially add new cameras and image formats are released every year.
I described more about what I've written for PhotoStructure (which does deduplication for both videos and images) here: https://photostructure.com/faq/what-do-you-mean-by-deduplica... -- it might help you avoid some of the pitfalls I've had to overcome.