similar: get rid of quadratic addedfiles.remove()
authorYuya Nishihara <yuya@tcha.org>
Thu, 23 Mar 2017 20:50:33 +0900
changeset 31580 d3e2af4e0128
parent 31579 3a383caa97f4
child 31581 b1528d195a13
similar: get rid of quadratic addedfiles.remove() Instead, build a set of files to be removed and recreate addedfiles only if necessary. Benchmark with 50k added/removed files, on tmpfs: $ hg addremove --dry-run --time -q original: real 16.550 secs (user 15.000+0.000 sys 1.540+0.000) previous: real 16.730 secs (user 15.280+0.000 sys 1.440+0.000) this patch: real 16.070 secs (user 14.470+0.000 sys 1.580+0.000)
mercurial/similar.py
--- a/mercurial/similar.py	Sun Mar 15 18:58:56 2015 +0900
+++ b/mercurial/similar.py	Thu Mar 23 20:50:33 2017 +0900
@@ -107,12 +107,14 @@
                     if fp in parentctx and parentctx[fp].size() > 0]
 
     # Find exact matches.
-    for (a, b) in _findexactmatches(repo, addedfiles[:], removedfiles):
-        addedfiles.remove(b)
+    matchedfiles = set()
+    for (a, b) in _findexactmatches(repo, addedfiles, removedfiles):
+        matchedfiles.add(b)
         yield (a.path(), b.path(), 1.0)
 
     # If the user requested similar files to be matched, search for them also.
     if threshold < 1.0:
+        addedfiles = [x for x in addedfiles if x not in matchedfiles]
         for (a, b, score) in _findsimilarmatches(repo, addedfiles,
                                                  removedfiles, threshold):
             yield (a.path(), b.path(), score)