Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 21:02:02 +0100] rev 40777
match: avoid translating glob to matcher multiple times for large sets
For hgignore with many globs, the resulting regexp might not fit under the 20K
length limit. So the patterns need to be broken up in smaller pieces.
Before this change, the logic was re-starting the full process from scratch
for each smaller pieces, including the translation of globs into regexp.
Effectively doing the work over and over.
If the 20K limit is reached, we are likely in a case where there is many such
glob, so exporting them is especially expensive and we should be careful not
to do that work more than once.
To work around this, we now translate glob to regexp once and for all. Then,
we assemble the resulting individual regexp into valid blocks.
This raises a very significant performance win for large `.hgignore file`:
Before: ! wall 0.153153 comb 0.150000 user 0.150000 sys 0.000000 (median of 66)
After: ! wall 0.059793 comb 0.060000 user 0.060000 sys 0.000000 (median of 100)
Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 17:25:49 +0100] rev 40776
match: extract function that group regexps
Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 17:16:05 +0100] rev 40775
match: test for overflow error in pattern
If a single pattern is too large to handle, we raise an exception. This case is
now doctested.
Boris Feld <boris.feld@octobus.net> [Thu, 22 Nov 2018 17:20:32 +0100] rev 40774
match: extract a literal constant into a symbolic one
Matt Harbison <matt_harbison@yahoo.com> [Sat, 01 Dec 2018 21:42:48 -0500] rev 40773
tests: apply binary mode to output in seq.py
I noticed this when playing with running tests using WSL, and iterating over the
output yielded '0\r', '1\r',... Most of the other *.py tools do this, and `seq`
on MSYS lacks '\r' in the output, so this is more consistent.
Boris Feld <boris.feld@octobus.net> [Fri, 23 Nov 2018 01:09:37 +0100] rev 40772
perf: add a `--clear-caches` to `perfbranchmapupdate`
This flag will help to measure the time we spend loading various cache that
support the branchmap update.
Example for an 500 000 revisions repository:
hg perfbranchmapupdate --base 'not tip' --target 'tip'
! wall 0.000860 comb 0.000000 user 0.000000 sys 0.000000 (best of 336)
hg perfbranchmapupdate --base 'not tip' --target 'tip' --clear-caches
! wall 0.029494 comb 0.030000 user 0.030000 sys 0.000000 (best of 100)
Boris Feld <boris.feld@octobus.net> [Wed, 21 Nov 2018 21:11:47 +0000] rev 40771
perf: start from an existing branchmap if possible
If the --base set if a superset of one of the cached branchmap, we should use as
a starting point. This greatly help the overall runtime of
`hg perfbranchmapupdate`
For example, for a repository with about 500 000 revisions, using this trick
make the command runtime move from about 200 second to about 10 seconds. A 20x
gain.
Boris Feld <boris.feld@octobus.net> [Wed, 21 Nov 2018 20:35:22 +0000] rev 40770
perf: rely on repoview for perfbranchmapupdate
Using 'repoview' matching the base and target subset make the benchmark more
realistic. It also unlocks optimization to make the command initialization
faster.
Boris Feld <boris.feld@octobus.net> [Wed, 21 Nov 2018 22:56:06 +0100] rev 40769
perf: pre-indent some code in `perfbranchmapupdate`
This make the next patch easier to read.
Boris Feld <boris.feld@octobus.net> [Wed, 21 Nov 2018 12:02:25 +0000] rev 40768
perf: add a `perfbranchmapupdate` command
This command benchmark the time necessary to update the branchmap between two
sets of revisions. This changeset introduce a first version, doing nothing fancy
regarding cache or other internal details.