bisect: avoid copying ancestor list for non-merge commits
During a bisection, hg needs to compute a list of all ancestors for every
candidate commit. This is accomplished via a bottom-up traversal of the set of
candidates, during which each revision's ancestor list is populated using the
ancestor list of its parent(s). Previously, this involved copying the entire
list, which could be very long in if the bisection range was large.
To help improve this, we can observe that each candidate commit is visited
exactly once, at which point its ancestor list is copied into its children's
lists and then dropped. In the case of non-merge commits, a commit's ancestor
list consists exactly of its parent's list plus itself. This means that we can
trivially reuse the parent's existing list for one of its non-merge children,
which avoids copying entirely if that commit is the parent's only child. This
makes bisections over linear ranges of commits much faster.
During some informal testing in the large publicly-available `mozilla-central`
repository, this noticeably sped up bisections over large ranges of history:
Setup:
$ cd mozilla-central
$ hg bisect --reset
$ hg bisect --good 0
$ hg log -r tip -T '{rev}\n'
628417
Test:
$ time hg bisect --bad tip --noupdate
Before:
real 3m35.927s
user 3m35.553s
sys 0m0.319s
After:
real 1m41.142s
user 1m40.810s
sys 0m0.285s
A python hook for "hg fix" that prints out the number of files and revisions
that were affected, along with which fixer tools were applied. Also checks how
many times it sees a specific key generated by one of the fixer tools defined
below.
$ cat >> $TESTTMP/postfixhook.py <<EOF
> import collections
> def file(ui, repo, rev=None, path=b'', metadata=None, **kwargs):
> ui.status(b'fixed %s in revision %d using %s\n' %
> (path, rev, b', '.join(metadata.keys())))
> def summarize(ui, repo, replacements=None, wdirwritten=False,
> metadata=None, **kwargs):
> counts = collections.defaultdict(int)
> keys = 0
> for fixername, metadatalist in metadata.items():
> for metadata in metadatalist:
> if metadata is None:
> continue
> counts[fixername] += 1
> if 'key' in metadata:
> keys += 1
> ui.status(b'saw "key" %d times\n' % (keys,))
> for name, count in sorted(counts.items()):
> ui.status(b'fixed %d files with %s\n' % (count, name))
> if replacements:
> ui.status(b'fixed %d revisions\n' % (len(replacements),))
> if wdirwritten:
> ui.status(b'fixed the working copy\n')
> EOF
Some mock output for fixer tools that demonstrate what could go wrong with
expecting the metadata output format.
$ printf 'new content\n' > $TESTTMP/missing
$ printf 'not valid json\0new content\n' > $TESTTMP/invalid
$ printf '{"key": "value"}\0new content\n' > $TESTTMP/valid
Configure some fixer tools based on the output defined above, and enable the
hooks defined above. Disable parallelism to make output of the parallel file
processing phase stable.
$ cat >> $HGRCPATH <<EOF
> [extensions]
> fix =
> [fix]
> metadatafalse:command=cat $TESTTMP/missing
> metadatafalse:pattern=metadatafalse
> metadatafalse:metadata=false
> missing:command=cat $TESTTMP/missing
> missing:pattern=missing
> missing:metadata=true
> invalid:command=cat $TESTTMP/invalid
> invalid:pattern=invalid
> invalid:metadata=true
> valid:command=cat $TESTTMP/valid
> valid:pattern=valid
> valid:metadata=true
> [hooks]
> postfixfile = python:$TESTTMP/postfixhook.py:file
> postfix = python:$TESTTMP/postfixhook.py:summarize
> [worker]
> enabled=false
> EOF
See what happens when we execute each of the fixer tools. Some print warnings,
some write back to the file.
$ hg init repo
$ cd repo
$ printf "old content\n" > metadatafalse
$ printf "old content\n" > invalid
$ printf "old content\n" > missing
$ printf "old content\n" > valid
$ hg add -q
$ hg fix -w
ignored invalid output from fixer tool: invalid
fixed metadatafalse in revision 2147483647 using metadatafalse
ignored invalid output from fixer tool: missing
fixed valid in revision 2147483647 using valid
saw "key" 1 times
fixed 1 files with valid
fixed the working copy
$ cat metadatafalse
new content
$ cat missing
old content
$ cat invalid
old content
$ cat valid
new content
$ cd ..