About WWW2003: Detecting Near-replicas on the Web by Content and Hyperlink Analysis
WWW2003: Detecting Near-replicas on the Web by Content and Hyperlink Analysis- Paper by Ernesto Di Iorio, et. al. proposing a technique for finding lists of similar documents, based on a pair of signatures which take into account both the document contents and the hyperlink structure.