Under The Hood: Duplicate Detection
While Zedge has an incredibly talented development team building all our apps, a lot of that team’s most impressive work is done behind the scenes, on various internal tools we use to maintain those apps and run our business. Under The Hood will be a recurring feature devoted to highlighting impressive technology you can’t necessarily see.
Zedge’s Duplicate Detector is just such a tool: we needed a way to ensure that nobody could resell the work of another Zedge Premium artist, and while it’s relatively straightforward to identify and reject exact duplicates of the same image, it’s a lot trickier to recognize when the copy has zoomed in a bit on the original, or changed the color palette, or added a few pixel differences here and there.
So we asked our data scientists if they could help us solve this problem, and they came back with a very sophisticated model that gives any two images a similarity score. 100% duplicates are rejected automatically upon upload, while items with a score of 96% to 99% get sent to human moderators for review.
99% scores (almost) always turn out to be duplicates we need to reject:
But a score of 96% sometimes just means two different people took a picture of the same type of dog:
Images with text can also generate false dupes with a score of 96%:
But a score of 96% might just mean somebody zoomed in on an image created by someone else:
We asked Georgina Armstrong and Emilio Capo, two of the data scientists who built this tool, to explain exactly how it works. Here’s their answer:
Our Duplicates Detection model is based on a neural network architecture called ResNet. ResNet is optimized for extracting high-level image features, which are then used to classify the image in one of as many as 1000 classes: one image gets in, one class label gets out.
But the classification is not the point – we discard the classification model and just use something that got created as a byproduct: embeddings. You see, during the classification process, the algorithm calculates "features" that help it classify the images into groups. You would do the same if you had to classify a basket of fresh produce into fruits and vegetables:
- Must be cooked before eating? Probably a vegetable.
- Full of fructose? Probably a fruit.
- Has green leaves? Probably a vegetable.
- Bright red color? Probably a fruit.
- My kid won't eat it? Probably a vegetable.
We extract these features for each item from the classification model in the form of a vector of numerical values. Once each wallpaper is mapped to a numerical vector, we can measure similarity among them by computing the distance among their feature vectors, which we refer to as embeddings. If two embeddings are too similar to each other - that is, their similarity exceeds the threshold we have defined - the pair is flagged as a potential duplicate pair for human review.
To go back to the produce example above, if I pull two items out of my cart, and both of them are sweet, round, red, come in a plastic packet of 12, cost about $2.50, and my kid will eat them... It's plausible that both objects are a red apple, and so I recommend that human look at them side by side to confirm they are the same.
This kind of modeling has been particularly successful for us in that it detects some of the most annoying types of digital art theft: slightly zooming in on an image, or cropping a couple pixels off the side, or changing the color temperature by a couple of degrees.