Gregory Szorc <gregory.szorc@gmail.com> [Thu, 23 Mar 2017 19:54:59 -0700] rev 31587
changegroup: store old heads as a set
Previously, the "oldheads" variable was a list. On a repository at
Mozilla with 46,492 heads, profiling revealed that list membership
testing was dominating execution time of applying small changegroups.
This patch converts the list of old heads to a set. This makes
membership testing significantly faster. On the aforementioned
repository with 46,492 heads:
$ hg unbundle <file with 1 changeset>
before: 18.535s wall
after: 1.303s
Consumers of this variable only check for truthiness (`if oldheads`),
length (`len(oldheads)`), and (most importantly) item membership
(`h not in oldheads` - which occurs twice). So, the change to a set
should be safe and suitable for stable.
The practical effect of this change is that changegroup application
and related operations (like `hg push`) no longer exhibit an O(n^2)
CPU explosion as the number of heads grows.
Pierre-Yves David <pierre-yves.david@ens-lyon.org> [Tue, 21 Mar 2017 23:30:13 +0100] rev 31586
checkheads: extract obsolete post processing in its own function
The checkheads function is long and complex, extract that logic in a subfunction
is win in itself. As the comment in the code says, this postprocessing is
currently very basic and either misbehave or fails to detect valid push in many
cases. My deeper motive for this extraction is to be make it easier to provide
extensive testing of this case and strategy to cover them. Final test and logic
will makes it to core once done.
Kostia Balytskyi <ikostia@fb.com> [Wed, 22 Mar 2017 11:26:23 -0700] rev 31585
tests: make test-simplekeyvaluefile.py py2.6-compatible
Python 2.6 unittest.TestCase does not have assertRaisesRegexp.
Yuya Nishihara <yuya@tcha.org> [Thu, 23 Mar 2017 20:57:27 +0900] rev 31584
similar: use cheaper hash() function to test exact matches
We just need a hash table {fctx.data(): fctx} which doesn't keep fctx.data()
in memory. Let's simply use hash(fctx.data()) to put data out from memory,
and manage collided fctx objects by list.
This isn't significantly faster than using sha1, but is more correct as we
know SHA-1 collision attack is getting practical.
Benchmark with 50k added/removed files, on tmpfs:
$ hg addremove --dry-run --time -q
previous: real 12.420 secs (user 11.120+0.000 sys 1.280+0.000)
this patch: real 12.350 secs (user 11.210+0.000 sys 1.140+0.000)
Yuya Nishihara <yuya@tcha.org> [Thu, 23 Mar 2017 20:52:41 +0900] rev 31583
similar: take the first match instead of the last
It seems more natural. This makes the next patch slightly cleaner.
Yuya Nishihara <yuya@tcha.org> [Thu, 23 Mar 2017 21:17:08 +0900] rev 31582
similar: do not look up and create filectx more than once
Benchmark with 50k added/removed files, on tmpfs:
$ hg addremove --dry-run --time -q
previous: real 16.070 secs (user 14.470+0.000 sys 1.580+0.000)
this patch: real 12.420 secs (user 11.120+0.000 sys 1.280+0.000)
Yuya Nishihara <yuya@tcha.org> [Thu, 23 Mar 2017 21:10:45 +0900] rev 31581
similar: use common names for changectx variables
We generally use 'wctx' and 'pctx' for working context and its parent
respectively.
Yuya Nishihara <yuya@tcha.org> [Thu, 23 Mar 2017 20:50:33 +0900] rev 31580
similar: get rid of quadratic addedfiles.remove()
Instead, build a set of files to be removed and recreate addedfiles
only if necessary.
Benchmark with 50k added/removed files, on tmpfs:
$ hg addremove --dry-run --time -q
original: real 16.550 secs (user 15.000+0.000 sys 1.540+0.000)
previous: real 16.730 secs (user 15.280+0.000 sys 1.440+0.000)
this patch: real 16.070 secs (user 14.470+0.000 sys 1.580+0.000)
Yuya Nishihara <yuya@tcha.org> [Sun, 15 Mar 2015 18:58:56 +0900] rev 31579
similar: sort files not by object id but by path for stable result
Perhaps the original implementation would want to sort added/removed files
alphabetically, but actually it did sort fctx objects by memory location.
This patch removes the use of set()s in order to preserve the order of
added/removed files. addedfiles.remove() becomes quadratic, but its cost
appears not dominant. Anyway, the quadratic behavior will be eliminated by
the next patch.
Benchmark with 50k added/removed files, on tmpfs:
$ mkdir src
$ for n in `seq 0 49`; do
> mkdir `printf src/%02d $n`
> done
$ for n in `seq 0 49999`; do
> f=`printf src/%02d/%05d $(($n/1000)) $n`
> dd if=/dev/urandom of=$f bs=8k count=1 status=none
> done
$ hg ci -qAm 'add 50k files of random content'
$ mv src dest
$ hg addremove --dry-run --time -q
original: real 16.550 secs (user 15.000+0.000 sys 1.540+0.000)
this patch: real 16.730 secs (user 15.280+0.000 sys 1.440+0.000)
Jun Wu <quark@fb.com> [Sun, 12 Mar 2017 01:34:17 -0800] rev 31578
debugfsinfo: print fstype information
Since we have osutil.getfstype, it'll be handy if "debugfsinfo" prints it.