Sun, 14 Apr 2024 02:38:41 +0200 perf: allow profiling of more than one run
Pierre-Yves David <pierre-yves.david@octobus.net> [Sun, 14 Apr 2024 02:38:41 +0200] rev 51589
perf: allow profiling of more than one run By default, we still profile the first run only. However profiling more run help to understand side effect from one run to the other. So we add an option to be able to do so.
Sun, 14 Apr 2024 02:36:55 +0200 profiler: flush after writing the profiler output
Pierre-Yves David <pierre-yves.david@octobus.net> [Sun, 14 Apr 2024 02:36:55 +0200] rev 51588
profiler: flush after writing the profiler output Otherwise, the profiler output might only partially appears until the next flush of the buffer. Since profiling often happens for long operation, the next flush can be a long time away.
Sun, 14 Apr 2024 02:33:36 +0200 stream-clone: disable gc for the entry listing section for the v2 format
Pierre-Yves David <pierre-yves.david@octobus.net> [Sun, 14 Apr 2024 02:33:36 +0200] rev 51587
stream-clone: disable gc for the entry listing section for the v2 format This is similar to the change we did for the v3 format in 6e4c8366c5ce. The benchmark bellow show this gives us a notable gains, especially on larger repositories. ### benchmark.name = hg.perf.stream-locked-section # benchmark.name = hg.perf.stream-locked-section # bin-env-vars.hg.flavor = default # bin-env-vars.hg.py-re2-module = default # benchmark.variants.version = v2 ## data-env-vars.name = pypy-2018-08-01-zstd-sparse-revlog 5e931bf8707c: 0.503820 ~~~~~ 1106d1bf695e: 0.470078 (-6.70%, -0.03) ## data-env-vars.name = pypy-2024-03-22-zstd-sparse-revlog 5e931bf8707c: 0.535756 ~~~~~ 1106d1bf695e: 0.490249 (-8.49%, -0.05) ## data-env-vars.name = heptapod-public-2024-03-25-zstd-sparse-revlog 5e931bf8707c: 1.327041 ~~~~~ 1106d1bf695e: 1.174636 (-11.48%, -0.15) ## data-env-vars.name = netbeans-2018-08-01-zstd-sparse-revlog 5e931bf8707c: 2.439158 ~~~~~ 1106d1bf695e: 2.220515 (-8.96%, -0.22) ## data-env-vars.name = netbeans-2019-11-07-zstd-sparse-revlog 5e931bf8707c: 2.630794 ~~~~~ 1106d1bf695e: 2.261473 (-14.04%, -0.37) ## data-env-vars.name = mozilla-central-2018-08-01-zstd-sparse-revlog 5e931bf8707c: 5.769002 ~~~~~ 1106d1bf695e: 5.062000 (-12.26%, -0.71) ## data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog 5e931bf8707c: 13.351750 ~~~~~ 1106d1bf695e: 12.346655 (-7.53%, -1.01) ## data-env-vars.name = mozilla-central-2024-03-22-zstd-sparse-revlog 5e931bf8707c: 10.772939 ~~~~~ 1106d1bf695e: 9.495407 (-11.86%, -1.28) ## data-env-vars.name = mozilla-unified-2024-03-22-zstd-sparse-revlog 5e931bf8707c: 10.864297 ~~~~~ 1106d1bf695e: 9.475597 (-12.78%, -1.39) ## data-env-vars.name = mozilla-try-2023-03-22-zstd-sparse-revlog 5e931bf8707c: 17.448335 ~~~~~ 1106d1bf695e: 16.027474 (-8.14%, -1.42)
Tue, 09 Apr 2024 02:54:19 +0200 phases: rework the logic of _pushdiscoveryphase to bound complexity
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 09 Apr 2024 02:54:19 +0200] rev 51586
phases: rework the logic of _pushdiscoveryphase to bound complexity This rework the various graph traversal in _pushdiscoveryphase to keep the complexity in check. This is done though a couple of things: - first, limiting the space we have to explore, for example, if we are not in publishing push, we don't need to consider remote draft roots that are also draft locally, as there is nothing to be moved there. - avoid unbounded descendant computation, and use the faster "rev between" computation. This provide a massive boost to performance when exchanging with repository with a massive amount of draft, like mozilla-try: ### data-env-vars.name = mozilla-try-2023-03-22-zstd-sparse-revlog # benchmark.name = hg.command.push # bin-env-vars.hg.flavor = default # bin-env-vars.hg.py-re2-module = default # benchmark.variants.explicit-rev = all-out-heads # benchmark.variants.issue6528 = disabled # benchmark.variants.protocol = ssh # benchmark.variants.reuse-external-delta-parent = default ## benchmark.variants.revs = any-1-extra-rev before: 20.346590 seconds after: 11.232059 seconds (-38.15%, -7.48 seconds) ## benchmark.variants.revs = any-100-extra-rev before: 24.752051 seconds after: 15.367412 seconds (-37.91%, -9.38 seconds) After this changes, the push operation is still quite too slow. Some of this can be attributed to general phases slowness (reading all the roots from disk for example) and other know slowness (not using persistent-nodemap, branchmap, tags, etc. We are also working on them, but with this series, phase discovery during push no longer showing up in profile and this is a pretty nice and bit low-hanging fruit out of the way. ### (same case as the above) # benchmark.variants.revs = any-1-extra-rev pre-%ln-change: 44.235070 this-changeset: 11.232059 seconds (-74.61%, -33.00 seconds) # benchmark.variants.revs = any-100-extra-rev pre-%ln-change: 49.234697 this-changeset: 15.367412 seconds (-68.79%, -33.87 seconds) Note that with this change, the `hg push` performance is now much closer to the `hg pull` performance, even it still lagging behind a bit. (and the overall performance are still too slow). ### data-env-vars.name = mozilla-try-2023-03-22-ds2-pnm # benchmark.variants.explicit-rev = all-out-heads # benchmark.variants.issue6528 = disabled # benchmark.variants.protocol = ssh # benchmark.variants.pulled-delta-reuse-policy = default # bin-env-vars.hg.flavor = rust ## benchmark.variants.revs = any-1-extra-rev hg.command.pull: 6.517450 hg.command.push: 11.219888 ## benchmark.variants.revs = any-100-extra-rev hg.command.pull: 10.160991 hg.command.push: 14.251107 ### data-env-vars.name = mozilla-try-2023-03-22-zstd-sparse-revlog # bin-env-vars.hg.py-re2-module = default # benchmark.variants.explicit-rev = all-out-heads # benchmark.variants.issue6528 = disabled # benchmark.variants.protocol = ssh # benchmark.variants.pulled-delta-reuse-policy = default ## bin-env-vars.hg.flavor = default ## benchmark.variants.revs = any-1-extra-rev hg.command.pull: 8.577772 hg.command.push: 11.232059 ## bin-env-vars.hg.flavor = default ## benchmark.variants.revs = any-100-extra-rev hg.command.pull: 13.152976 hg.command.push: 15.367412 ## bin-env-vars.hg.flavor = rust ## benchmark.variants.revs = any-1-extra-rev hg.command.pull: 8.731982 hg.command.push: 11.178751 ## bin-env-vars.hg.flavor = rust ## benchmark.variants.revs = any-100-extra-rev hg.command.pull: 13.184236 hg.command.push: 15.620843
Fri, 05 Apr 2024 22:47:44 +0200 phases: introduce a performant efficient way to access revision in a set
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 22:47:44 +0200] rev 51585
phases: introduce a performant efficient way to access revision in a set This will be useful in the next changesets.
Fri, 05 Apr 2024 14:13:47 +0200 phases: use revision number in `_pushdiscoveryphase`
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 14:13:47 +0200] rev 51584
phases: use revision number in `_pushdiscoveryphase` We now reach our target checkpoint in terms of rev-num conversion. The `_pushdiscoveryphase` function is now performing graph computation based on revision number only. Avoiding repeated conversion from node-id to rev-num. See previous changeset updated `new_heads` for rationnal. Again, time saved in the 100 milliseconds order of magnitude for the mozilla-try benchmark I have been using. However, wow that the logic is done using revision number, we can look into having better logic in the next changesets, which will provide a much bigger speedup.
Fri, 05 Apr 2024 14:11:02 +0200 phases: move RemotePhasesSummary to revision number
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 14:11:02 +0200] rev 51583
phases: move RemotePhasesSummary to revision number This continue our quest to align more logic on revision number instead of node-ids. The motivation is similar to the change to `new_heads` and `analyze_remote_phases` a few changeset earlier. Again, we take this as an opportunity to rename the class, and the attribute to the new naming scheme. This will highlight the need for code update for any code using it an expecting node-ids. Many of the rev-num → node-id conversion we had to introduce in the previous changesets can now be removed. More will be removed in the future as we continue to align code toward rev-num usage. time saved in the 100 milliseconds order of magnitude for the mozilla-try benchmark I have been using.
Fri, 05 Apr 2024 12:24:47 +0200 phases: stop using `repo.set` in `remotephasessummary`
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 12:24:47 +0200] rev 51582
phases: stop using `repo.set` in `remotephasessummary` The `repository.set` create changectx on the fly, an expensive operation. Using `repo.revs` and a direct rev-num → node-id translation will be significantly faster. This is especially true as we prepare ourself to no longer do the rev-num → node-id transalation there. The speedup is a bit lost in the overall noisyness of the slow phase discovery algorithm, but it save a small amount of time in my benchmark.
Fri, 05 Apr 2024 12:02:43 +0200 phases: use revision number in analyze_remote_phases
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 12:02:43 +0200] rev 51581
phases: use revision number in analyze_remote_phases Same logic as the previous change to `new_heads`, see rationnal there. This avoids a small number of `nodes -> revs` conversion speeding thing up in the 100 milliseconds order of magnitude for the worses cases. However, the rest of the logic is noisy enough that it hardly matters for now.
Fri, 05 Apr 2024 11:33:47 +0200 phases: use revision number in new_heads
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 11:33:47 +0200] rev 51580
phases: use revision number in new_heads All graph operations will be done using revision numbers, so passing nodes only means they will eventually get converted to revision numbers internally. As part of an effort to align the code on using revision number we make the `phases.newheads` function operated on revision number, taking them as input and using them in returns, instead of the node-id it used to consume and produce. This is part of multiple changesets effort to translate more part of the logic, but is done step by step to facilitate the identification of issue that might arise in mercurial core and extensions. To make the change simpler to handle for third party extensions, we also rename the function, using a more modern form. This will help detecting the different between the node-id version and the rev-num version. I also take this as an opportunity to add some comment about possible performance improvement for the future. They don't matter too much now, but they are worse exploring in a while.
(0) -30000 -10000 -3000 -1000 -300 -100 -10 +10 tip