stream-clone: avoid opening a revlog in case we do not need it
Opening an revlog has a cost, especially if it is inline as we have to scan the
file and construct an index.
To prevent the associated slowdown, we just do a minimal scan to check that an
inline file is still inline, and simply stream the file without creating a
revlog when we can.
This provides a big boost compared to the previous changeset, even if the full
generation is still penalized by the initial gathering of information.
All benchmarks are run on linux with Python 3.10.7.
# benchmark.name = hg.exchange.stream.generate
# benchmark.variants.version = v2
### Compared to the previous changesets
We get a large win all across the board!
# mercurial-2018-08-01-zstd-sparse-revlog
before: 0.250694 seconds
after: 0.105986 seconds (-57.72%)
# pypy-2018-08-01-zstd-sparse-revlog
before: 3.885657 seconds
after: 1.709748 seconds (-56.00%)
# netbeans-2018-08-01-zstd-sparse-revlog
before: 16.679371 seconds
after: 7.687469 seconds (-53.91%)
# mozilla-central-2018-08-01-zstd-sparse-revlog
before: 38.575482 seconds
after: 17.520316 seconds (-54.58%)
# mozilla-try-2019-02-18-zstd-sparse-revlog
before: 81.160994 seconds
after: 37.073753 seconds (-54.32%)
### Compared to 6.4.3
We are still significantly slower than 6.4.3, the extra time is usually twice
slower than the extra time we observe on the locked section, which is a quite
interesting information.
Except for mercurial-central that is much faster. That discrepancy is not really
explained yet.
# mercurial-2018-08-01-zstd-sparse-revlog
6.4.3: 0.072560 seconds
after: 0.105986 seconds (+46.07%) (- 0.03 seconds)
# pypy-2018-08-01-zstd-sparse-revlog
6.4.3: 1.211193 seconds
after: 1.709748 seconds (+41.16%) (-0.45 seconds)
# netbeans-2018-08-01-zstd-sparse-revlog
6.4.3: 4.932843 seconds
after: 7.687469 seconds (+55.84%) (-2.75 seconds)
# mozilla-central-2018-08-01-zstd-sparse-revlog
6.4.3: 34.012226 seconds
after: 17.520316 seconds (-48.49%) (-16.49 seconds)
# mozilla-try-2019-02-18-zstd-sparse-revlog
6.4.3: 23.850555 seconds
after: 37.073753 seconds (+55.44%) (+13.22 seconds)
#!/usr/bin/env python3
# Dump revlogs as raw data stream
# $ find .hg/store/ -name "*.i" | xargs dumprevlog > repo.dump
import sys
from mercurial.node import hex
from mercurial import (
encoding,
pycompat,
revlog,
)
from mercurial.utils import procutil
from mercurial.revlogutils import (
constants as revlog_constants,
)
for fp in (sys.stdin, sys.stdout, sys.stderr):
procutil.setbinary(fp)
def binopen(path, mode=b'rb'):
if b'b' not in mode:
mode = mode + b'b'
return open(path, pycompat.sysstr(mode))
binopen.options = {}
def printb(data, end=b'\n'):
sys.stdout.flush()
procutil.stdout.write(data + end)
for f in sys.argv[1:]:
localf = encoding.strtolocal(f)
if not localf.endswith(b'.i'):
print("file:", f, file=sys.stderr)
print(" invalid filename", file=sys.stderr)
r = revlog.revlog(
binopen,
target=(revlog_constants.KIND_OTHER, b'dump-revlog'),
radix=localf[:-2],
)
print("file:", f)
for i in r:
n = r.node(i)
p = r.parents(n)
d = r.revision(n)
printb(b"node: %s" % hex(n))
printb(b"linkrev: %d" % r.linkrev(i))
printb(b"parents: %s %s" % (hex(p[0]), hex(p[1])))
printb(b"length: %d" % len(d))
printb(b"-start-")
printb(d)
printb(b"-end-")