Sun, 14 Apr 2024 02:27:10 +0200 proxy-vfs: also proxy the `audit` attribute
Pierre-Yves David <pierre-yves.david@octobus.net> [Sun, 14 Apr 2024 02:27:10 +0200] rev 51593
proxy-vfs: also proxy the `audit` attribute In the previous changeset, we had to do a little dance to access the useful `audit` attribute. We now provide a proper accessors to it. We don't update the code in `perf.py` because it has to remain compatible with older version of Mercurial. This will just be nicer in the future.
Sat, 13 Apr 2024 23:40:28 +0200 perf: clear vfs audit_cache before each run
Pierre-Yves David <pierre-yves.david@octobus.net> [Sat, 13 Apr 2024 23:40:28 +0200] rev 51592
perf: clear vfs audit_cache before each run When generating a stream clone, we spend a large amount of time auditing path. Before this changes, the first run was warming the vfs cache for the other runs, leading to a large runtime difference and a "faulty" reported timing for the operation. We now clear this important cache between run to get a more realistic timing. Below are some example of median time change when clearing these cases. The maximum time for a run did not changed significantly. ### data-env-vars.name = mozilla-central-2018-08-01-zstd-sparse-revlog # benchmark.name = hg.perf.exchange.stream.generate # bin-env-vars.hg.flavor = default # bin-env-vars.hg.py-re2-module = default # benchmark.variants.version = latest no-clearing: 17.289905 cache-clearing: 21.587965 (+24.86%, +4.30) ## data-env-vars.name = mozilla-central-2024-03-22-zstd-sparse-revlog no-clearing: 32.670748 cache-clearing: 40.467095 (+23.86%, +7.80) ## data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog no-clearing: 37.838858 cache-clearing: 46.072749 (+21.76%, +8.23) ## data-env-vars.name = mozilla-unified-2024-03-22-zstd-sparse-revlog no-clearing: 32.969395 cache-clearing: 39.646209 (+20.25%, +6.68) In addition, this significantly reduce the timing difference between the performance command, from the perf extensions and a `real `hg bundle` call producing a stream bundle. Some significant differences remain especially on the "mozilla-try" repositories, but they are now smaller. Note that some of that difference will actually not be attributable to the stream generation (like maybe phases or branch map computation). Below are some benchmarks done on a currently draft changeset fixing some unrelated slowness in `hg bundle` (34a78972af409d1ff37c29e60f6ca811ad1a457d) ### data-env-vars.name = mozilla-central-2018-08-01-zstd-sparse-revlog # bin-env-vars.hg.flavor = default # bin-env-vars.hg.py-re2-module = default hg.perf.exchange.stream.generate: 21.587965 hg.command.bundle: 24.301799 (+12.57%, +2.71) ## data-env-vars.name = mozilla-central-2024-03-22-zstd-sparse-revlog hg.perf.exchange.stream.generate: 40.467095 hg.command.bundle: 44.831317 (+10.78%, +4.36) ## data-env-vars.name = mozilla-unified-2024-03-22-zstd-sparse-revlog hg.perf.exchange.stream.generate: 39.646209 hg.command.bundle: 45.395258 (+14.50%, +5.75) ## data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog hg.perf.exchange.stream.generate: 46.072749 hg.command.bundle: 55.882608 (+21.29%, +9.81) ## data-env-vars.name = mozilla-try-2023-03-22-zlib-general-delta hg.perf.exchange.stream.generate: 334.716708 hg.command.bundle: 377.856767 (+12.89%, +43.14) ## data-env-vars.name = mozilla-try-2023-03-22-zstd-sparse-revlog hg.perf.exchange.stream.generate: 302.972301 hg.command.bundle: 326.098755 (+7.63%, +23.13)
Sun, 14 Apr 2024 02:41:36 +0200 perf: start recording total time after warming
Pierre-Yves David <pierre-yves.david@octobus.net> [Sun, 14 Apr 2024 02:41:36 +0200] rev 51591
perf: start recording total time after warming The warming might be costly and this should not affect the "time profile" of the actual collection.
Sun, 14 Apr 2024 02:40:15 +0200 perf: run the gc before each run
Pierre-Yves David <pierre-yves.david@octobus.net> [Sun, 14 Apr 2024 02:40:15 +0200] rev 51590
perf: run the gc before each run The python garbage collector is a large source of performance troubles, we run it right before the timed section to reduce the change for the gc to add noise to the benchmark.
Sun, 14 Apr 2024 02:38:41 +0200 perf: allow profiling of more than one run
Pierre-Yves David <pierre-yves.david@octobus.net> [Sun, 14 Apr 2024 02:38:41 +0200] rev 51589
perf: allow profiling of more than one run By default, we still profile the first run only. However profiling more run help to understand side effect from one run to the other. So we add an option to be able to do so.
Sun, 14 Apr 2024 02:36:55 +0200 profiler: flush after writing the profiler output
Pierre-Yves David <pierre-yves.david@octobus.net> [Sun, 14 Apr 2024 02:36:55 +0200] rev 51588
profiler: flush after writing the profiler output Otherwise, the profiler output might only partially appears until the next flush of the buffer. Since profiling often happens for long operation, the next flush can be a long time away.
Sun, 14 Apr 2024 02:33:36 +0200 stream-clone: disable gc for the entry listing section for the v2 format
Pierre-Yves David <pierre-yves.david@octobus.net> [Sun, 14 Apr 2024 02:33:36 +0200] rev 51587
stream-clone: disable gc for the entry listing section for the v2 format This is similar to the change we did for the v3 format in 6e4c8366c5ce. The benchmark bellow show this gives us a notable gains, especially on larger repositories. ### benchmark.name = hg.perf.stream-locked-section # benchmark.name = hg.perf.stream-locked-section # bin-env-vars.hg.flavor = default # bin-env-vars.hg.py-re2-module = default # benchmark.variants.version = v2 ## data-env-vars.name = pypy-2018-08-01-zstd-sparse-revlog 5e931bf8707c: 0.503820 ~~~~~ 1106d1bf695e: 0.470078 (-6.70%, -0.03) ## data-env-vars.name = pypy-2024-03-22-zstd-sparse-revlog 5e931bf8707c: 0.535756 ~~~~~ 1106d1bf695e: 0.490249 (-8.49%, -0.05) ## data-env-vars.name = heptapod-public-2024-03-25-zstd-sparse-revlog 5e931bf8707c: 1.327041 ~~~~~ 1106d1bf695e: 1.174636 (-11.48%, -0.15) ## data-env-vars.name = netbeans-2018-08-01-zstd-sparse-revlog 5e931bf8707c: 2.439158 ~~~~~ 1106d1bf695e: 2.220515 (-8.96%, -0.22) ## data-env-vars.name = netbeans-2019-11-07-zstd-sparse-revlog 5e931bf8707c: 2.630794 ~~~~~ 1106d1bf695e: 2.261473 (-14.04%, -0.37) ## data-env-vars.name = mozilla-central-2018-08-01-zstd-sparse-revlog 5e931bf8707c: 5.769002 ~~~~~ 1106d1bf695e: 5.062000 (-12.26%, -0.71) ## data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog 5e931bf8707c: 13.351750 ~~~~~ 1106d1bf695e: 12.346655 (-7.53%, -1.01) ## data-env-vars.name = mozilla-central-2024-03-22-zstd-sparse-revlog 5e931bf8707c: 10.772939 ~~~~~ 1106d1bf695e: 9.495407 (-11.86%, -1.28) ## data-env-vars.name = mozilla-unified-2024-03-22-zstd-sparse-revlog 5e931bf8707c: 10.864297 ~~~~~ 1106d1bf695e: 9.475597 (-12.78%, -1.39) ## data-env-vars.name = mozilla-try-2023-03-22-zstd-sparse-revlog 5e931bf8707c: 17.448335 ~~~~~ 1106d1bf695e: 16.027474 (-8.14%, -1.42)
Tue, 09 Apr 2024 02:54:19 +0200 phases: rework the logic of _pushdiscoveryphase to bound complexity
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 09 Apr 2024 02:54:19 +0200] rev 51586
phases: rework the logic of _pushdiscoveryphase to bound complexity This rework the various graph traversal in _pushdiscoveryphase to keep the complexity in check. This is done though a couple of things: - first, limiting the space we have to explore, for example, if we are not in publishing push, we don't need to consider remote draft roots that are also draft locally, as there is nothing to be moved there. - avoid unbounded descendant computation, and use the faster "rev between" computation. This provide a massive boost to performance when exchanging with repository with a massive amount of draft, like mozilla-try: ### data-env-vars.name = mozilla-try-2023-03-22-zstd-sparse-revlog # benchmark.name = hg.command.push # bin-env-vars.hg.flavor = default # bin-env-vars.hg.py-re2-module = default # benchmark.variants.explicit-rev = all-out-heads # benchmark.variants.issue6528 = disabled # benchmark.variants.protocol = ssh # benchmark.variants.reuse-external-delta-parent = default ## benchmark.variants.revs = any-1-extra-rev before: 20.346590 seconds after: 11.232059 seconds (-38.15%, -7.48 seconds) ## benchmark.variants.revs = any-100-extra-rev before: 24.752051 seconds after: 15.367412 seconds (-37.91%, -9.38 seconds) After this changes, the push operation is still quite too slow. Some of this can be attributed to general phases slowness (reading all the roots from disk for example) and other know slowness (not using persistent-nodemap, branchmap, tags, etc. We are also working on them, but with this series, phase discovery during push no longer showing up in profile and this is a pretty nice and bit low-hanging fruit out of the way. ### (same case as the above) # benchmark.variants.revs = any-1-extra-rev pre-%ln-change: 44.235070 this-changeset: 11.232059 seconds (-74.61%, -33.00 seconds) # benchmark.variants.revs = any-100-extra-rev pre-%ln-change: 49.234697 this-changeset: 15.367412 seconds (-68.79%, -33.87 seconds) Note that with this change, the `hg push` performance is now much closer to the `hg pull` performance, even it still lagging behind a bit. (and the overall performance are still too slow). ### data-env-vars.name = mozilla-try-2023-03-22-ds2-pnm # benchmark.variants.explicit-rev = all-out-heads # benchmark.variants.issue6528 = disabled # benchmark.variants.protocol = ssh # benchmark.variants.pulled-delta-reuse-policy = default # bin-env-vars.hg.flavor = rust ## benchmark.variants.revs = any-1-extra-rev hg.command.pull: 6.517450 hg.command.push: 11.219888 ## benchmark.variants.revs = any-100-extra-rev hg.command.pull: 10.160991 hg.command.push: 14.251107 ### data-env-vars.name = mozilla-try-2023-03-22-zstd-sparse-revlog # bin-env-vars.hg.py-re2-module = default # benchmark.variants.explicit-rev = all-out-heads # benchmark.variants.issue6528 = disabled # benchmark.variants.protocol = ssh # benchmark.variants.pulled-delta-reuse-policy = default ## bin-env-vars.hg.flavor = default ## benchmark.variants.revs = any-1-extra-rev hg.command.pull: 8.577772 hg.command.push: 11.232059 ## bin-env-vars.hg.flavor = default ## benchmark.variants.revs = any-100-extra-rev hg.command.pull: 13.152976 hg.command.push: 15.367412 ## bin-env-vars.hg.flavor = rust ## benchmark.variants.revs = any-1-extra-rev hg.command.pull: 8.731982 hg.command.push: 11.178751 ## bin-env-vars.hg.flavor = rust ## benchmark.variants.revs = any-100-extra-rev hg.command.pull: 13.184236 hg.command.push: 15.620843
Fri, 05 Apr 2024 22:47:44 +0200 phases: introduce a performant efficient way to access revision in a set
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 22:47:44 +0200] rev 51585
phases: introduce a performant efficient way to access revision in a set This will be useful in the next changesets.
Fri, 05 Apr 2024 14:13:47 +0200 phases: use revision number in `_pushdiscoveryphase`
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 14:13:47 +0200] rev 51584
phases: use revision number in `_pushdiscoveryphase` We now reach our target checkpoint in terms of rev-num conversion. The `_pushdiscoveryphase` function is now performing graph computation based on revision number only. Avoiding repeated conversion from node-id to rev-num. See previous changeset updated `new_heads` for rationnal. Again, time saved in the 100 milliseconds order of magnitude for the mozilla-try benchmark I have been using. However, wow that the logic is done using revision number, we can look into having better logic in the next changesets, which will provide a much bigger speedup.
(0) -30000 -10000 -3000 -1000 -300 -100 -10 +10 tip