memory-usage: fix `hg log --follow --rev R F` space complexity stable
authorPierre-Yves David <pierre-yves.david@octobus.net>
Sat, 19 Nov 2022 01:35:01 +0100
branchstable
changeset 49622 dcb2581e33be
parent 49621 55c6ebd11cb9
child 49623 c890d8b8bc59
memory-usage: fix `hg log --follow --rev R F` space complexity When running `hg log --follow --rev REVS FILES`, the log code will walk the history of all FILES starting from the file revisions that exists in each REVS. Before doing so, it looks if the files actually exists in the target revisions. To do so, it opens the manifest of each revision in REVS to look up if we find the associated items in FILES. Before this changeset this was done in a way that created a changectx for each target revision, keeping them in memory while we look into each file. If the set of REVS is large, this means keeping the manifest for each entry in REVS in memory. That can be largeā€¦ if REV is in the form `::X`, this can quickly become huge and saturate the memory. We have seen usage allocating 2GB per second until memory runs out. So this changeset invert the two loop so that only one revision is kept in memory during the operation. This solve the memory explosion issue.
mercurial/logcmdutil.py
--- a/mercurial/logcmdutil.py	Fri Nov 18 13:47:29 2022 +0000
+++ b/mercurial/logcmdutil.py	Sat Nov 19 01:35:01 2022 +0100
@@ -817,17 +817,26 @@
             # There may be the case that a path doesn't exist in some (but
             # not all) of the specified start revisions, but let's consider
             # the path is valid. Missing files will be warned by the matcher.
-            startctxs = [repo[r] for r in revs]
-            for f in match.files():
-                found = False
-                for c in startctxs:
-                    if f in c:
-                        found = True
-                    elif c.hasdir(f):
+            all_files = list(match.files())
+            missing_files = set(all_files)
+            files = all_files
+            for r in revs:
+                if not files:
+                    # We don't have any file to check anymore.
+                    break
+                ctx = repo[r]
+                for f in files:
+                    if f in ctx:
+                        missing_files.discard(f)
+                    elif ctx.hasdir(f):
                         # If a directory exists in any of the start revisions,
                         # take the slow path.
-                        found = slowpath = True
-                if not found:
+                        missing_files.discard(f)
+                        slowpath = True
+                        # we found on slow path, no need to search for more.
+                        files = missing_files
+            for f in all_files:
+                if f in missing_files:
                     raise error.StateError(
                         _(
                             b'cannot follow file not in any of the specified '