dirstate: speed up repeated missing directory checks
authorMartin von Zweigbergk <martinvonz@google.com>
Wed, 19 Nov 2014 23:15:07 -0800
changeset 23375 a179db3db9b9
parent 23374 aa0a430d9c75
child 23378 47091002ae62
dirstate: speed up repeated missing directory checks In a mozilla repo with tip at bb3ff09f52fe, hg update tip~1000 && time hg revert -nq -r tip . displays ~4:20 minutes. With tip~100, it runs in ~11 s. With revision 100000, it did not finish in 12 minutes. Revert calls dirstate.status() with a matcher that matches each file in the target revision. The main problem [1] lies in dirstate._walkexplicit(), which looks for matching deleted directories by checking whether each path is prefix of any path in the dirstate. With m files in the dirstate and n files in the target revision that are not in the dirstate, this is clearly O(m*n). Let's improve by keeping a lazily initialized set of all the directories in the dirstate, so the time becomes O(m+n). After this patch, the 4:20 minutes become 5.5 s, while for a single missing path, it slows down from 1.092 s to 1.150 s (best of 4). The >12 min case becomes 5.8 s. [1] A narrower optimization would be to make revert take the fast path for '.' and '--all'.
mercurial/dirstate.py
--- a/mercurial/dirstate.py	Wed Nov 19 17:07:27 2014 -0800
+++ b/mercurial/dirstate.py	Wed Nov 19 23:15:07 2014 -0800
@@ -629,6 +629,7 @@
         results = dict.fromkeys(subrepos)
         results['.hg'] = None
 
+        alldirs = None
         for ff in files:
             if normalize:
                 nf = normalize(normpath(ff), False, True)
@@ -657,13 +658,12 @@
                 if nf in dmap: # does it exactly match a missing file?
                     results[nf] = None
                 else: # does it match a missing directory?
-                    prefix = nf + "/"
-                    for fn in dmap:
-                        if fn.startswith(prefix):
-                            if matchedir:
-                                matchedir(nf)
-                            notfoundadd(nf)
-                            break
+                    if alldirs is None:
+                        alldirs = scmutil.dirs(dmap)
+                    if nf in alldirs:
+                        if matchedir:
+                            matchedir(nf)
+                        notfoundadd(nf)
                     else:
                         badfn(ff, inst.strerror)