Mon, 21 Nov 2016 17:47:11 -0500 httppeer: drop an except block that says it happens only on Python 2.3
Augie Fackler <augie@google.com> [Mon, 21 Nov 2016 17:47:11 -0500] rev 30475
httppeer: drop an except block that says it happens only on Python 2.3
Fri, 21 Oct 2016 00:03:46 +0900 windows: do not replace sys.stdout by winstdout
Yuya Nishihara <yuya@tcha.org> [Fri, 21 Oct 2016 00:03:46 +0900] rev 30474
windows: do not replace sys.stdout by winstdout Now we use util.stdout everywhere.
Thu, 20 Oct 2016 23:53:36 +0900 py3: bulk replace sys.stdin/out/err by util's
Yuya Nishihara <yuya@tcha.org> [Thu, 20 Oct 2016 23:53:36 +0900] rev 30473
py3: bulk replace sys.stdin/out/err by util's Almost all sys.stdin/out/err in hgext/ and mercurial/ are replaced by util's. There are a few exceptions: - lsprof.py and statprof.py are untouched since they are a kind of vendor code and they never import mercurial modules right now. - ui._readline() needs to replace sys.stdin and stdout to pass them to raw_input(). We'll need another workaround here.
Thu, 20 Oct 2016 23:40:24 +0900 py3: provide bytes stdin/out/err through util module
Yuya Nishihara <yuya@tcha.org> [Thu, 20 Oct 2016 23:40:24 +0900] rev 30472
py3: provide bytes stdin/out/err through util module Since standard streams are TextIO on Python 3, we can't use sys.stdin/out/err directly. Fortunately we can get the underlying BytesIO via .buffer as long as the streams aren't replaced by e.g. StringIO. stdin/out/err are provided through util so we can wrap them by platform API.
Fri, 21 Oct 2016 00:09:38 +0900 util: rewrite pycompat imports to make pyflakes always happy
Yuya Nishihara <yuya@tcha.org> [Fri, 21 Oct 2016 00:09:38 +0900] rev 30471
util: rewrite pycompat imports to make pyflakes always happy I'll add more imports which would confuse pyflakes.
Thu, 20 Oct 2016 23:27:09 +0900 windows: do not replace sys.__stdout__
Yuya Nishihara <yuya@tcha.org> [Thu, 20 Oct 2016 23:27:09 +0900] rev 30470
windows: do not replace sys.__stdout__ Now we don't use sys.__stdout__ except for getting its fileno(), so we no longer have to wrap it by winstdout. This helps adding pycompat.stdin/out/err.
Mon, 21 Nov 2016 15:38:56 +0530 py3: update test-check-py3-compat.t output
Pulkit Goyal <7895pulkit@gmail.com> [Mon, 21 Nov 2016 15:38:56 +0530] rev 30469
py3: update test-check-py3-compat.t output This part remains unchanged because it runs in Python 3 only.
Mon, 21 Nov 2016 15:35:22 +0530 py3: use pycompat.sysargv in dispatch.run()
Pulkit Goyal <7895pulkit@gmail.com> [Mon, 21 Nov 2016 15:35:22 +0530] rev 30468
py3: use pycompat.sysargv in dispatch.run() Another one to have a bytes result from sys.argv in Python 3. This one is also a part of running `hg version` on Python 3.
Mon, 21 Nov 2016 15:26:47 +0530 py3: use pycompat.sysargv in scmposix.systemrcpath()
Pulkit Goyal <7895pulkit@gmail.com> [Mon, 21 Nov 2016 15:26:47 +0530] rev 30467
py3: use pycompat.sysargv in scmposix.systemrcpath() sys.argv returns unicodes on Python 3. We have pycompat.sysargv which returns bytes encoded using os.fsencode(). After this patch scmposix.systemrcpath() returns bytes in Python 3 world. This change is also a part of making `hg version` run in Python 3.
Sun, 20 Nov 2016 13:50:45 -0800 wireproto: perform chunking and compression at protocol layer (API)
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 20 Nov 2016 13:50:45 -0800] rev 30466
wireproto: perform chunking and compression at protocol layer (API) Currently, the "streamres" response type is populated with a generator of chunks with compression possibly already applied. This puts the onus on commands to perform chunking and compression. Architecturally, I think this is the wrong place to perform this work. I think commands should say "here is the data" and the protocol layer should take care of encoding the final bytes to put on the wire. Additionally, upcoming commits will improve wire protocol support for compression. Having a central place for performing compression in the protocol transport layer will be easier than having to deal with compression at the commands layer. This commit refactors the "streamres" response type to accept either a generator or an object with "read." Additionally, the type now accepts a flag indicating whether the response is a "version 1 compressible" response. This basically identifies all commands currently performing compression. I could have used a special type for this, but a flag works just as well. The argument name foreshadows the introduction of wire protocol changes, hence the "v1." The code for chunking and compressing has been moved to the output generation function for each protocol transport. Some code has been inlined, resulting in the deletion of now unused methods.
Sun, 20 Nov 2016 13:55:53 -0800 httppeer: use compression engine API for decompressing responses
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 20 Nov 2016 13:55:53 -0800] rev 30465
httppeer: use compression engine API for decompressing responses In preparation for supporting multiple compression formats on the wire protocol, we need all users of the wire protocol to use compression engine APIs. This commit ports the HTTP wire protocol client to use the compression engine API. The code for handling the HTTPException is a bit hacky. Essentially, HTTPException could be thrown by any read() from the socket. However, as part of porting the API, we no longer have a generator wrapping the socket and we don't have a single place where we can trap the exception. We solve this by introducing a proxy class that intercepts read() and converts the exception appropriately. In the future, we could introduce a new compression engine API that supports emitting a generator of decompressed chunks. This would eliminate the need for the proxy class. As I said when I introduced the decompressorreader() API, I'm not fond of it and would support transitioning to something better. This can be done as a follow-up, preferably once all the code is using the compression engine API and we have a better idea of the API needs of all the consumers.
Sat, 19 Nov 2016 18:31:40 -0800 httppeer: do decompression inside _callstream
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 19 Nov 2016 18:31:40 -0800] rev 30464
httppeer: do decompression inside _callstream The current HTTP transport protocol only compresses certain command responses and requires calls to that command to call "_callcompressable," which zlib decompresses the response transparently. Upcoming changes will enable *any* response to be compressed with varying compression formats. In order to handle this better, this commit moves the decompression bits to the main function performing the HTTP request. We introduce an underscore-prefixed argument to denote this behavior so it doesn't conflict with a named argument to a command.
Sat, 19 Nov 2016 17:11:12 -0800 keepalive: reorder header precedence
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 19 Nov 2016 17:11:12 -0800] rev 30463
keepalive: reorder header precedence There are 3 sources of headers used by this function: * The default headers defined by the URL opener * Headers that are copied on redirects * Headers that aren't copied on redirects Previously, we applied the default headers from the URL opener last. This feels wrong to me as those headers are the most low level and something built on top of the URL opener may wish to override them. So, this commit changes the order to apply them with the least precedence. While I was here, I removed a Python version test that is no longer necessary.
Sat, 19 Nov 2016 10:54:21 -0800 debuginstall: print compression engine support
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 19 Nov 2016 10:54:21 -0800] rev 30462
debuginstall: print compression engine support Since compression engines may be provided by extensions and since not all registered compression engines may be available to use, it seems useful to provide a mechanism to see the state of known compression engines. This commit teaches `hg debuginstall` to print info on known and available compression engines.
Sun, 20 Nov 2016 16:56:21 -0800 bdiff: don't check border condition in loop
Gregory Szorc <gregory.szorc@gmail.com> [Sun, 20 Nov 2016 16:56:21 -0800] rev 30461
bdiff: don't check border condition in loop This is pretty much a copy of d500ddae7494, just to a different loop. The condition `p == plast` (`plast == a + len - 1`) was only true on the final iteration of the loop. So it was wasteful to check for it on every iteration. We decrease the iteration count by 1 and add an explicit check for `p == plast` after the loop. Again, we see modest wins. From the mozilla-unified repository: $ perfbdiff -m 3041e4d59df2 ! wall 0.035502 comb 0.040000 user 0.040000 sys 0.000000 (best of 100) ! wall 0.030480 comb 0.030000 user 0.030000 sys 0.000000 (best of 100) $ perfbdiff 0e9928989e9c --alldata --count 100 ! wall 4.097394 comb 4.100000 user 4.100000 sys 0.000000 (best of 3) ! wall 3.597798 comb 3.600000 user 3.600000 sys 0.000000 (best of 3) The 2nd example throws a total of ~3.3GB of data at bdiff. This change increases the throughput from ~811 MB/s to ~924 MB/s.
Sat, 19 Nov 2016 15:41:37 -0800 conflicts: make spacing consistent in conflict markers
Kostia Balytskyi <ikostia@fb.com> [Sat, 19 Nov 2016 15:41:37 -0800] rev 30460
conflicts: make spacing consistent in conflict markers The way default marker template was defined before this patch, the spacing before dash in conflict markes was dependent on whether changeset is a tip one or not. This is a relevant part of template: '{ifeq(tags, "tip", "", "{tags} "}' If revision is a tip revision with no other tags, this would resolve to an empty string, but for revisions which are not tip and don't have any other tags, this would resolve to a single space string. In the end this causes weirdnesses like the ones you can see in the affected tests. This is a not a big deal, but double spacing may be visually less pleasant. Please note that test changes where commit hashes change are the result of marking files as resolved without removing markers.
Thu, 10 Nov 2016 09:21:41 -0800 rebase: move bookmark update to before rebase clearing
Durham Goode <durham@fb.com> [Thu, 10 Nov 2016 09:21:41 -0800] rev 30459
rebase: move bookmark update to before rebase clearing Bookmark fixing should probably happen before the rebase starts to clean up, so let's move it before clearrebased. This will also help a future patch where we want to add more clear logic to the existing clear section.
Fri, 28 Oct 2016 17:44:28 +0200 setup: include a dummy $PATH in the custom environment used by build.py
Gábor Stefanik <gabor.stefanik@nng.com> [Fri, 28 Oct 2016 17:44:28 +0200] rev 30458
setup: include a dummy $PATH in the custom environment used by build.py This is required for building with pypiwin32, the pip-installable replacement for pywin32.
Fri, 11 Nov 2016 07:01:27 -0800 shelve: move unshelve-finishing logic to a separate function
Kostia Balytskyi <ikostia@fb.com> [Fri, 11 Nov 2016 07:01:27 -0800] rev 30457
shelve: move unshelve-finishing logic to a separate function Finishing unshelve involves two steps now: - stripping a changelog - aborting a transaction Obs-based shelve will not require these things, so isolating this logic into a separate function where the normal/obs-shelve branching is going to be implemented seems to be like a nice idea. Behavior-wise this change moves 'unshelvecleanup' from being between changelog stripping and transaction abortion to being after them. I don't think this has any negative effects.
Thu, 10 Nov 2016 11:02:39 -0800 shelve: move file-forgetting logic to a separate function
Kostia Balytskyi <ikostia@fb.com> [Thu, 10 Nov 2016 11:02:39 -0800] rev 30456
shelve: move file-forgetting logic to a separate function This is just a readability improvement.
Thu, 10 Nov 2016 10:57:10 -0800 shelve: move rebasing logic to a separate function
Kostia Balytskyi <ikostia@fb.com> [Thu, 10 Nov 2016 10:57:10 -0800] rev 30455
shelve: move rebasing logic to a separate function Rebasing restored shelved commit onto the right destination is done differently in traditional and obs-based unshelve: - for traditional, we just rebase it - for obs-based, we need to check whether a successor of the restored commit already exists in the destination (this might happen when unshelving twice on the same destination) This is the reason why this piece of logic should be in its own function: to not have excessive complexity in the main function.
Thu, 10 Nov 2016 10:51:06 -0800 shelve: move commit restoration logic to a separate function
Kostia Balytskyi <ikostia@fb.com> [Thu, 10 Nov 2016 10:51:06 -0800] rev 30454
shelve: move commit restoration logic to a separate function
Sun, 13 Nov 2016 03:35:52 -0800 shelve: move temporary commit creation to a separate function
Kostia Balytskyi <ikostia@fb.com> [Sun, 13 Nov 2016 03:35:52 -0800] rev 30453
shelve: move temporary commit creation to a separate function Committing working copy changes before rebasing a shelved commit on top of them is an independent piece of behavior, which fits into its own function. Similar to the previous series, this and a couple of following patches are for unshelve refactoring.
Thu, 17 Nov 2016 20:30:00 -0800 commands: print chunk type in debugrevlog
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 17 Nov 2016 20:30:00 -0800] rev 30452
commands: print chunk type in debugrevlog Each data entry ("chunk") in a revlog has a type based on the first byte of the data. This type indicates how to interpret the data. This seems like a useful thing to be able to query through a debug command. So let's add that to `hg debugrevlog`. This does make `hg debugrevlog` slightly slower, as it has to read more than just the index. However, even on the mozilla-unified manifest (which is ~200MB spread over ~350K revisions), this takes <400ms.
Thu, 17 Nov 2016 20:17:51 -0800 perf: add command for measuring revlog chunk operations
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 17 Nov 2016 20:17:51 -0800] rev 30451
perf: add command for measuring revlog chunk operations Upcoming commits will teach revlogs to leverage the new compression engine API so that new compression formats can more easily be leveraged in revlogs. We want to be sure this refactoring doesn't regress performance. So this commit introduces "perfrevchunks" to explicitly test performance of reading, decompressing, and recompressing revlog chunks. Here is output when run on the mozilla-unified repo: $ hg perfrevlogchunks -c ! read ! wall 0.346603 comb 0.350000 user 0.340000 sys 0.010000 (best of 28) ! read w/ reused fd ! wall 0.337707 comb 0.340000 user 0.320000 sys 0.020000 (best of 30) ! read batch ! wall 0.013206 comb 0.020000 user 0.000000 sys 0.020000 (best of 221) ! read batch w/ reused fd ! wall 0.013259 comb 0.030000 user 0.010000 sys 0.020000 (best of 222) ! chunk ! wall 1.909939 comb 1.910000 user 1.900000 sys 0.010000 (best of 6) ! chunk batch ! wall 1.750677 comb 1.760000 user 1.740000 sys 0.020000 (best of 6) ! compress ! wall 5.668004 comb 5.670000 user 5.670000 sys 0.000000 (best of 3) $ hg perfrevlogchunks -m ! read ! wall 0.365834 comb 0.370000 user 0.350000 sys 0.020000 (best of 26) ! read w/ reused fd ! wall 0.350160 comb 0.350000 user 0.320000 sys 0.030000 (best of 28) ! read batch ! wall 0.024777 comb 0.020000 user 0.000000 sys 0.020000 (best of 119) ! read batch w/ reused fd ! wall 0.024895 comb 0.030000 user 0.000000 sys 0.030000 (best of 118) ! chunk ! wall 2.514061 comb 2.520000 user 2.480000 sys 0.040000 (best of 4) ! chunk batch ! wall 2.380788 comb 2.380000 user 2.360000 sys 0.020000 (best of 5) ! compress ! wall 9.815297 comb 9.820000 user 9.820000 sys 0.000000 (best of 3) We already see some interesting data, such as how much slower non-batched chunk reading is and that zlib compression appears to be >2x slower than decompression. I didn't have the data when I wrote this commit message, but I ran this on Mozilla's NFS-based Mercurial server and the time for reading with a reused file descriptor was faster. So I think it is worth testing both with and without file descriptor reuse so we can make informed decisions about recycling file descriptors.
Thu, 17 Nov 2016 20:09:10 -0800 setup: add flag to build_ext to control building zstd
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 17 Nov 2016 20:09:10 -0800] rev 30450
setup: add flag to build_ext to control building zstd Downstream packagers will inevitably want to disable building the vendored python-zstandard Python package. Rather than force them to patch setup.py, let's give them a knob to use. distutils Command classes support defining custom options. It requires setting certain class attributes (yes, class attributes: instance attributes don't work because the class type is consulted before it is instantiated). We already have a custom child class of build_ext, so we set these class attributes, implement some scaffolding, and override build_extensions to filter the Extension instance for the zstd extension if the `--no-zstd` argument is specified. Example usage: $ python setup.py build_ext --no-zstd
Wed, 09 Nov 2016 16:01:34 +0000 drawdag: update test repos by drawing the changelog DAG in ASCII
Jun Wu <quark@fb.com> [Wed, 09 Nov 2016 16:01:34 +0000] rev 30449
drawdag: update test repos by drawing the changelog DAG in ASCII Currently, we have "debugbuilddag" which is a powerful tool to build test cases but not intuitive. We may end up running "hg log" in the test to make the test more readable. This patch adds a "drawdag" extension with a "debugdrawdag" command for similar testing purpose. Unlike the cryptic "debugbuilddag" command, it reads an ASCII graph that is intuitive to human, so the test case can be more readable. Unlike "debugbuilddag", "drawdag" does not require an empty repo. So it can be used to add new changesets to an existing repo. Since the "drawdag" logic is not that trivial and only makes sense for testing purpose, the extension is added to the "tests" directory, to make the core logic clean. If we find it useful (for example, to demonstrate cases and help user understand some cases) and want to ship it by default in the future, we can move it to a ship-by-default "debugdrawdag" at that time.
Wed, 14 Jan 2015 01:15:26 +0100 posix: give checklink a fast path that cache the check file and is read only
Mads Kiilerich <madski@unity3d.com> [Wed, 14 Jan 2015 01:15:26 +0100] rev 30448
posix: give checklink a fast path that cache the check file and is read only util.checklink would create a symlink and remove it again. That would sometimes happen multiple times. Write operations are relatively expensive and give disk tear and noise for applications monitoring file system activity. Instead of creating a symlink and deleting it again, just create it once and leave it in .hg/cache/check-link . If the file exists, just verify that os.islink reports true. We will assume that this check is as good as symlink creation not failing. Note: The symlink left in .hg/cache has to resolve to a file - otherwise 'make dist' will fail ... test-symlink-os-yes-fs-no.py does some monkey patching to simulate a platform without symlink support. The slightly different testing method requires additional monkeying.
Thu, 17 Nov 2016 12:59:36 +0100 posix: move checklink test file to .hg/cache
Mads Kiilerich <madski@unity3d.com> [Thu, 17 Nov 2016 12:59:36 +0100] rev 30447
posix: move checklink test file to .hg/cache This avoids unnecessary churn in the working directory. It is not necessarily a fully valid assumption that .hg/cache is on the same filesystem as the working directory, but I think it is an acceptable approximation. It could also be the case that different parts of the working directory is on different mount points so checking in the root folder could also be wrong.
Wed, 14 Jan 2015 01:15:26 +0100 posix: give checkexec a fast path; keep the check files and test read only
Mads Kiilerich <madski@unity3d.com> [Wed, 14 Jan 2015 01:15:26 +0100] rev 30446
posix: give checkexec a fast path; keep the check files and test read only Before, Mercurial would create a new temporary file every time, stat it, change its exec mode, stat it again, and delete it. Most of this dance was done to handle the rare and not-so-essential case of VFAT mounts on unix. The cost of that was paid by the much more common and important case of using normal file systems. Instead, try to create and preserve .hg/cache/checkisexec and .hg/cache/checknoexec with and without exec flag set. If the files exist and have correct exec flags set, we can conclude that that file system supports the exec flag. Best case, the whole exec check can thus be done with two stat calls. Worst case, we delete the wrong files and check as usual. That will be because temporary loss of exec bit or on file systems without support for the exec bit. In that case we check as we did before, with the additional overhead of one extra stat call. It is possible that this different test algorithm in some cases on odd file systems will give different behaviour. Again, I think it will be rare and special cases and I think it is worth the risk. test-clone.t happens to show the situation where checkisexec is left behind from the old style check, while checknoexec only will be created next time a exec check will be performed.
Wed, 14 Jan 2015 01:15:26 +0100 posix: simplify checkexec check
Mads Kiilerich <madski@unity3d.com> [Wed, 14 Jan 2015 01:15:26 +0100] rev 30445
posix: simplify checkexec check Use a slightly simpler logic that in some cases can avoid an unnecessary chmod and stat. Instead of flipping the X bits, make it more clear that we rely on no X bits being set on initial file creation, and that at least some of them stick after they all have been set.
Thu, 17 Nov 2016 12:59:36 +0100 posix: move checkexec test file to .hg/cache
Mads Kiilerich <madski@unity3d.com> [Thu, 17 Nov 2016 12:59:36 +0100] rev 30444
posix: move checkexec test file to .hg/cache This avoids unnecessary churn in the working directory. It is not necessarily a fully valid assumption that .hg/cache is on the same filesystem as the working directory, but I think it is an acceptable approximation. It could also be the case that different parts of the working directory is on different mount points so checking in the root folder could also be wrong.
Thu, 17 Nov 2016 15:31:19 -0800 manifest: move manifestctx creation into manifestlog.get()
Durham Goode <durham@fb.com> [Thu, 17 Nov 2016 15:31:19 -0800] rev 30443
manifest: move manifestctx creation into manifestlog.get() Most manifestctx creation already happened in manifestlog.get(), but there was one spot in the manifestctx class itself that created an instance manually. This patch makes that one instance go through the manifestlog. This means extensions can just wrap manifestlog.get() and it will cover all manifestctx creations. It also means this code path now hits the manifestlog cache.
Fri, 11 Nov 2016 01:10:07 -0800 util: implement zstd compression engine
Gregory Szorc <gregory.szorc@gmail.com> [Fri, 11 Nov 2016 01:10:07 -0800] rev 30442
util: implement zstd compression engine Now that zstd is vendored and being built (in some configurations), we can implement a compression engine for zstd! The zstd engine is a little different from existing engines. Because it may not always be present, we have to defer load the module in case importing it fails. We facilitate this via a cached property that holds a reference to the module or None. The "available" method is implemented to reflect reality. The zstd engine declares its ability to handle bundles using the "zstd" human name and the "ZS" internal name. The latter was chosen because internal names are 2 characters (by only convention I think) and "ZS" seems reasonable. The engine, like others, supports specifying the compression level. However, there are no consumers of this API that yet pass in that argument. I have plans to change that, so stay tuned. Since all we need to do to support bundle generation with a new compression engine is implement and register the compression engine, bundle generation with zstd "just works!" Tests demonstrating this have been added. How does performance of zstd for bundle generation compare? On the mozilla-unified repo, `hg bundle --all -t <engine>-v2` yields the following on my i7-6700K on Linux: engine CPU time bundle size vs orig size throughput none 97.0s 4,054,405,584 100.0% 41.8 MB/s bzip2 (l=9) 393.6s 975,343,098 24.0% 10.3 MB/s gzip (l=6) 184.0s 1,140,533,074 28.1% 22.0 MB/s zstd (l=1) 108.2s 1,119,434,718 27.6% 37.5 MB/s zstd (l=2) 111.3s 1,078,328,002 26.6% 36.4 MB/s zstd (l=3) 113.7s 1,011,823,727 25.0% 35.7 MB/s zstd (l=4) 116.0s 1,008,965,888 24.9% 35.0 MB/s zstd (l=5) 121.0s 977,203,148 24.1% 33.5 MB/s zstd (l=6) 131.7s 927,360,198 22.9% 30.8 MB/s zstd (l=7) 139.0s 912,808,505 22.5% 29.2 MB/s zstd (l=12) 198.1s 854,527,714 21.1% 20.5 MB/s zstd (l=18) 681.6s 789,750,690 19.5% 5.9 MB/s On compression, zstd for bundle generation delivers: * better compression than gzip with significantly less CPU utilization * better than bzip2 compression ratios while still being significantly faster than gzip * ability to aggressively tune compression level to achieve significantly smaller bundles That last point is important. With clone bundles, a server can pre-generate a bundle file, upload it to a static file server, and redirect clients to transparently download it during clone. The server could choose to produce a zstd bundle with the highest compression settings possible. This would take a very long time - a magnitude longer than a typical zstd bundle generation - but the result would be hundreds of megabytes smaller! For the clone volume we do at Mozilla, this could translate to petabytes of bandwidth savings per year and faster clones (due to smaller transfer size). I don't have detailed numbers to report on decompression. However, zstd decompression is fast: >1 GB/s output throughput on this machine, even through the Python bindings. And it can do that regardless of the compression level of the input. By the time you have enough data to worry about overhead of decompression, you have plenty of other things to worry about performance wise. zstd is wins all around. I can't wait to implement support for it on the wire protocol and in revlogs.
Thu, 10 Nov 2016 23:38:41 -0800 hghave: add check for zstd support
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:38:41 -0800] rev 30441
hghave: add check for zstd support Not all configurations will support zstd. Add a check so we can conditionalize tests.
Thu, 10 Nov 2016 23:34:15 -0800 exchange: obtain compression engines from the registrar
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:34:15 -0800] rev 30440
exchange: obtain compression engines from the registrar util.compengines has knowledge of all registered compression engines and the metadata that associates them with various bundle types. This patch removes the now redundant declaration of this metadata from exchange.py and obtains it from the new source. The effect of this patch is that once a new compression engine is registered with util.compengines, `hg bundle -t <engine>` will just work.
Thu, 10 Nov 2016 23:29:01 -0800 bundle2: equate 'UN' with no compression
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:29:01 -0800] rev 30439
bundle2: equate 'UN' with no compression An upcoming patch will change the "alg" argument passed to this function from None to "UN" when no compression is wanted. The existing implementation of bundle2 does not set a "Compression" parameter if no compression is used. In theory, setting "Compression=UN" should work. But I haven't audited the code to see if all client versions supporting bundle2 will accept this. Rather than take the risk, avoid the BC breakage and treat "UN" the same as None.
Thu, 10 Nov 2016 23:15:02 -0800 util: check for compression engine availability before returning
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:15:02 -0800] rev 30438
util: check for compression engine availability before returning If a requested compression engine is registered but not available, requesting it will now abort. To be honest, I'm not sure if this is the appropriate mechanism for handling optional compression engines. I won't know until all uses of compression (bundles, wire protocol, revlogs, etc) are using the new API and zstd (our planned optional engine) is implemented. So this API could change.
Thu, 10 Nov 2016 23:03:48 -0800 util: expose an "available" API on compression engines
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 23:03:48 -0800] rev 30437
util: expose an "available" API on compression engines When the zstd compression engine is introduced, it won't work in all installations, namely pure Python installs. So, we need a mechanism to declare whether a compression engine is available. We don't want to conditionally register the compression engine because it is sometimes useful to know when a compression engine name or encountered data is valid but just not available versus unknown.
Thu, 10 Nov 2016 22:26:35 -0800 setup: compile zstd C extension
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 22:26:35 -0800] rev 30436
setup: compile zstd C extension Now that zstd and python-zstandard are vendored, we can start compiling them as part of the install. python-zstandard provides a self-contained Python function that returns a distutils.extension.Extension, so it is really easy to add zstd to our setup.py without having to worry about defining source files, include paths, etc. The function even allows specifying the module name the extension should be compiled as. This conveniently allows us to compile the module into the "mercurial" package so "our" version won't collide with a version installed under the canonical "zstd" module name.
Thu, 10 Nov 2016 22:15:58 -0800 zstd: vendor python-zstandard 0.5.0
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 22:15:58 -0800] rev 30435
zstd: vendor python-zstandard 0.5.0 As the commit message for the previous changeset says, we wish for zstd to be a 1st class citizen in Mercurial. To make that happen, we need to enable Python to talk to the zstd C API. And that requires bindings. This commit vendors a copy of existing Python bindings. Why do we need to vendor? As the commit message of the previous commit says, relying on systems in the wild to have the bindings or zstd present is a losing proposition. By distributing the zstd and bindings with Mercurial, we significantly increase our chances that zstd will work. Since zstd will deliver a better end-user experience by achieving better performance, this benefits our users. Another reason is that the Python bindings still aren't stable and the API is somewhat fluid. While Mercurial could be coded to target multiple versions of the Python bindings, it is safer to bundle an explicit, known working version. The added Python bindings are mostly a fully-featured interface to the zstd C API. They allow one-shot operations, streaming, reading and writing from objects implements the file object protocol, dictionary compression, control over low-level compression parameters, and more. The Python bindings work on Python 2.6, 2.7, and 3.3+ and have been tested on Linux and Windows. There are CFFI bindings, but they are lacking compared to the C extension. Upstream work will be needed before we can support zstd with PyPy. But it will be possible. The files added in this commit come from Git commit e637c1b214d5f869cf8116c550dcae23ec13b677 from https://github.com/indygreg/python-zstandard and are added without modifications. Some files from the upstream repository have been omitted, namely files related to continuous integration. In the spirit of full disclosure, I'm the maintainer of the "python-zstandard" project and have authored 100% of the code added in this commit. Unfortunately, the Python bindings have not been formally code reviewed by anyone. While I've tested much of the code thoroughly (I even have tests that fuzz APIs), there's a good chance there are bugs, memory leaks, not well thought out APIs, etc. If someone wants to review the code and send feedback to the GitHub project, it would be greatly appreciated. Despite my involvement with both projects, my opinions of code style differ from Mercurial's. The code in this commit introduces numerous code style violations in Mercurial's linters. So, the code is excluded from most lints. However, some violations I agree with. These have been added to the known violations ignore list for now.
Thu, 10 Nov 2016 21:45:29 -0800 zstd: vendor zstd 1.1.1
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 21:45:29 -0800] rev 30434
zstd: vendor zstd 1.1.1 zstd is a new compression format and it is awesome, yielding higher compression ratios and significantly faster compression and decompression operations compared to zlib (our current compression engine of choice) across the board. We want zstd to be a 1st class citizen in Mercurial and to eventually be the preferred compression format for various operations. This patch starts the formal process of supporting zstd by vendoring a copy of zstd. Why do we need to vendor zstd? Good question. First, zstd is relatively new and not widely available yet. If we didn't vendor zstd or distribute it with Mercurial, most users likely wouldn't have zstd installed or even available to install. What good is a feature if you can't use it? Vendoring and distributing the zstd sources gives us the highest liklihood that zstd will be available to Mercurial installs. Second, the Python bindings to zstd (which will be vendored in a separate changeset) make use of zstd APIs that are only available via static linking. One reason they are only available via static linking is that they are unstable and could change at any time. While it might be possible for the Python bindings to attempt to talk to different versions of the zstd C library, the safest thing to do is link against a specific, known-working version of zstd. This is why the Python zstd bindings themselves vendor zstd and why we must as well. This also explains why the added files are in a "python-zstandard" directory. The added files are from the 1.1.1 release of zstd (Git commit 4c0b44f8ced84c4c8edfa07b564d31e4fa3e8885 from https://github.com/facebook/zstd) and are added without modifications. Not all files from the zstd "distribution" have been added. Notably missing are files to support interacting with "legacy," pre-1.0 versions of zstd. The decision of which files to include is made by the upstream python-zstandard project (which I'm the author of). The files in this commit are a snapshot of the files from the 0.5.0 release of that project, Git commit e637c1b214d5f869cf8116c550dcae23ec13b677 from https://github.com/indygreg/python-zstandard.
Tue, 15 Nov 2016 21:56:49 +0100 bdiff: give slight preference to removing trailing lines
Mads Kiilerich <madski@unity3d.com> [Tue, 15 Nov 2016 21:56:49 +0100] rev 30433
bdiff: give slight preference to removing trailing lines [This change could be folded into the previous changeset to minimize the repo churn ...] Similar to the previous change, introduce an exception to the general preference for matches in the middle of bdiff ranges: If the best match on the B side starts at the beginning of the bdiff range, don't aim for the middle-most A side match but for the earliest. New (later) matches on the A side will only be considered better if the corresponding match on the B side *not* is at the beginning of the range. Thus, if the best (middle-most) match on the B side turns out to be at the beginning of the range, the earliest match on the A side will be used. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22807275 to 22808120 bytes - a 0.004% increase.
Tue, 15 Nov 2016 21:56:49 +0100 bdiff: give slight preference to appending lines
Mads Kiilerich <madski@unity3d.com> [Tue, 15 Nov 2016 21:56:49 +0100] rev 30432
bdiff: give slight preference to appending lines [This change could be folded into the previous changeset to minimize the repo churn ...] The general preference to matches in the middle of bdiff ranges helps getting balanced recursion and efficient computation. But, as previous changes have shown, it might also give diffs that seems "obviously wrong". To mitigate that: If the best match on the A side starts at the beginning of the bdiff range, don't aim for the middle-most B side match but for the earliest. This will make the matches balanced (by both sides being "early") even though the bisection will be less balanced. Still, this case only apply if the *best* and middle-most match was fully unbalanced on the A side. Each recursion will thus even in this worst case reduce the problem significantly and we are not re-introducing the problem that was fixed in f1ca249696ed. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22806817 to 22807275 bytes - a 0.002% increase. This make the recent test-bdiff.py changes give a more pretty output ... but they no longer show that the recursion is around middle matches (because it in these cases isn't).
Tue, 08 Nov 2016 18:37:33 +0100 bdiff: give slight preference to longest matches in the middle of the B side
Mads Kiilerich <madski@unity3d.com> [Tue, 08 Nov 2016 18:37:33 +0100] rev 30431
bdiff: give slight preference to longest matches in the middle of the B side We already have a slight preference for matches close to the middle on the A side. Now, do the same on the B side. j is iterating the b range backwards and we thus accept a new j if the previous match was in the upper half. This makes the test-bhalf diff "correct". It obviously also gives more preference to balanced recursion than to appending to sequences. That is kind of correct, but will also unfortunately make some bundles bigger. No doubt, we can also create examples where it will make them smaller ... The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22803824 to 22806817 bytes - an 0.01% increase.
Tue, 08 Nov 2016 18:37:33 +0100 bdiff: rearrange the "better longest match" code
Mads Kiilerich <madski@unity3d.com> [Tue, 08 Nov 2016 18:37:33 +0100] rev 30430
bdiff: rearrange the "better longest match" code This is primarily to make the code more managable and prepare for later changes. More specific assignments might also be slightly faster, even thought it also might generate a bit more code.
Tue, 08 Nov 2016 18:37:33 +0100 bdiff: adjust criteria for getting optimal longest match in the A side middle
Mads Kiilerich <madski@unity3d.com> [Tue, 08 Nov 2016 18:37:33 +0100] rev 30429
bdiff: adjust criteria for getting optimal longest match in the A side middle We prefer matches closer to the middle to balance recursion, as introduced in f1ca249696ed. For ranges with uneven length, matches starting exactly in the middle should have preference. That will be optimal for matches of length 1. We will thus accept equality in the half check. For ranges with even length, half was ceil'ed when calculated but we got the preference for low matches from the 'less than half' check. To get the same result as before when we also accept equality, floor it. Without that, test-annotate.t would show some different (still correct but less optimal) results. This will change the heuristics. Tests shows a slightly different output - and sometimes slightly smaller bundles. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22804885 to 22803824 bytes - an 0.005% reduction.
Tue, 08 Nov 2016 18:37:33 +0100 tests: explore some bdiff cases
Mads Kiilerich <madski@unity3d.com> [Tue, 08 Nov 2016 18:37:33 +0100] rev 30428
tests: explore some bdiff cases
Tue, 15 Nov 2016 21:56:49 +0100 tests: make test-bdiff.py easier to maintain
Mads Kiilerich <madski@unity3d.com> [Tue, 15 Nov 2016 21:56:49 +0100] rev 30427
tests: make test-bdiff.py easier to maintain Add more stdout logging to help navigate the .out file.
Thu, 17 Nov 2016 08:52:52 -0800 perf: unbust perfbdiff --alldata
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 17 Nov 2016 08:52:52 -0800] rev 30426
perf: unbust perfbdiff --alldata This broke in f84fc6a92817 due to a refactored manifest API. The fix is a bit hacky - perfbdiff doesn't yet support tree manifests for example. But it gets the job done. A test has been added for --alldata so this doesn't happen again.
Thu, 17 Nov 2016 20:57:09 +0900 worker: discard waited pid by anyone who noticed it first
Yuya Nishihara <yuya@tcha.org> [Thu, 17 Nov 2016 20:57:09 +0900] rev 30425
worker: discard waited pid by anyone who noticed it first This makes sure all waited pids are removed before calling killworkers() even if waitpid()-pids.discard() sequence is interrupted by another SIGCHLD.
Thu, 17 Nov 2016 21:08:58 +0900 worker: kill workers after all zombie processes are reaped
Yuya Nishihara <yuya@tcha.org> [Thu, 17 Nov 2016 21:08:58 +0900] rev 30424
worker: kill workers after all zombie processes are reaped Since we now wait child processes in non-blocking way (changed by 7bc25549e084 and e8fb03cfbbde), we don't have to kill them in the middle of the waitpid() loop. This change will help solving a possible race of waitpid()-pids.discard() sequence and another SIGCHLD. waitforworkers() is called by cleanup(), in which case we do killworkers() beforehand so we can remove killworkers() from waitforworkers().
Thu, 17 Nov 2016 20:44:05 +0900 worker: make sure killworkers() never be interrupted by another SIGCHLD
Yuya Nishihara <yuya@tcha.org> [Thu, 17 Nov 2016 20:44:05 +0900] rev 30423
worker: make sure killworkers() never be interrupted by another SIGCHLD killworkers() iterates over pids, which can be updated by SIGCHLD handler. So we should either copy pids or prevent killworkers() from being interrupted by SIGCHLD. I chose the latter as it is simpler and can make pids handling more consistent. This fixes a possible "set changed size during iteration" error at killworkers() before cleanup().
Thu, 17 Nov 2016 21:43:01 +0900 worker: fix missed break on successful waitpid()
Yuya Nishihara <yuya@tcha.org> [Thu, 17 Nov 2016 21:43:01 +0900] rev 30422
worker: fix missed break on successful waitpid() Follow-up for 5069a8a40b1b.
Thu, 10 Nov 2016 16:49:42 -0500 filterpyflakes: dramatically simplify the entire thing by blacklisting
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:49:42 -0500] rev 30421
filterpyflakes: dramatically simplify the entire thing by blacklisting We've only got one kind of pyflakes failure left in our codebase, so it's time to switch over to a blacklist-based checking scheme. I've left in the filtering of two undefined names for now out of paranoia, but those can probably go too.
Thu, 10 Nov 2016 16:07:24 -0500 run-tests: forward Python USER_BASE from site (issue5425)
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:07:24 -0500] rev 30420
run-tests: forward Python USER_BASE from site (issue5425) We do this so that any linters installed via pip install --user don't break. See https://docs.python.org/2/library/site.html#site.USER_BASE for a description of what this nonsense is all about. An alternative would be to not set HOME, but that'll cause other problems (see issue2707), or to forward every single path entry from sys.path in PYTHONPATH (which seems sketchy in its own way).
Mon, 14 Nov 2016 22:43:25 +0100 shelve: add missing space in help text stable
Mads Kiilerich <madski@unity3d.com> [Mon, 14 Nov 2016 22:43:25 +0100] rev 30419
shelve: add missing space in help text The change is trivial and unlikely to have been translated so we update translation files too.
Tue, 15 Nov 2016 20:25:51 +0000 util: improve iterfile so it chooses code path wisely
Jun Wu <quark@fb.com> [Tue, 15 Nov 2016 20:25:51 +0000] rev 30418
util: improve iterfile so it chooses code path wisely We have performance concerns on "iterfile" as it is 4X slower on normal files. While modern systems have the nice property that reading a "fast" (on-disk) file cannot be interrupted and should be made use of. This patch dumps the related knowledge in comments. And "iterfile" chooses code paths wisely: 1. If it's CPython 3, or PyPY, use the fast path. 2. If fp is a normal file, use the fast path. 3. If fp is not a normal file and CPython version >= 2.7.4, use the same workaround (4x slower) as before. 4. If fp is not a normal file and CPython version < 2.7.4, use another workaround (2x slower but may block longer then necessary) which basically re-invents the buffer + readline logic in Python. This will give us good confidence on both correctness and performance dealing with EINTR in iterfile(fp) for all known supported Python versions.
Wed, 16 Nov 2016 23:29:28 -0500 merge with stable
Augie Fackler <augie@google.com> [Wed, 16 Nov 2016 23:29:28 -0500] rev 30417
merge with stable
Sat, 12 Nov 2016 03:06:07 +0000 worker: stop using a separate thread waiting for children
Jun Wu <quark@fb.com> [Sat, 12 Nov 2016 03:06:07 +0000] rev 30416
worker: stop using a separate thread waiting for children Now that we have a SIGCHLD hander, and it could get executed when waiting for I/O. It's no longer necessary to have a separated waitpid thread. So just remove it.
Sat, 12 Nov 2016 03:07:22 +0000 worker: add a SIGCHLD handler to collect worker immediately
Jun Wu <quark@fb.com> [Sat, 12 Nov 2016 03:07:22 +0000] rev 30415
worker: add a SIGCHLD handler to collect worker immediately As planned by previous patches, add a SIGCHLD handler to get notifications about worker exits, and deals with worker failure immediately. Note that the SIGCHLD handler gets unregistered before killworkers(), so SIGCHLD won't interrupt "killworkers" - making it harder to send kill signals to waited processes.
Tue, 15 Nov 2016 02:12:16 +0000 worker: make waitforworkers reentrant
Jun Wu <quark@fb.com> [Tue, 15 Nov 2016 02:12:16 +0000] rev 30414
worker: make waitforworkers reentrant We are going to use it in the SIGCHLD handler. The handler will be executed in the main thread with the non-blocking version of waitpid, while the waitforworkers thread runs the blocking version. It's possible that one of them collects a worker and makes the other error out (no child to wait). This patch handles these errors: ECHILD is ignored. EINTR needs a retry. The "pids" set is designed to be only modifiable by "waitforworkers". And we only remove items after a successful waitpid. Since a child process can only be "waitpid"-ed once. It's guaranteed that "pids.remove(p)" won't be called with duplicated "p"s. And once a "p" is removed from "pids", that "p" does not need to be killed or waited any more.
Tue, 15 Nov 2016 02:10:40 +0000 worker: change "pids" to a set
Jun Wu <quark@fb.com> [Tue, 15 Nov 2016 02:10:40 +0000] rev 30413
worker: change "pids" to a set There is no need to keep any order of the "pids" array. A set is more efficient for the "remove" operation. And the following patch will use that.
Thu, 28 Jul 2016 20:57:07 +0100 worker: allow waitforworkers to be non-blocking
Jun Wu <quark@fb.com> [Thu, 28 Jul 2016 20:57:07 +0100] rev 30412
worker: allow waitforworkers to be non-blocking This patch adds a boolean flag to waitforworkers and makes it non-blocking if set to True. This is to make it possible that we can reap our workers while keep other unrelated children untouched, after receiving SIGCHLD.
Thu, 28 Jul 2016 20:51:20 +0100 worker: wait worker pid explicitly
Jun Wu <quark@fb.com> [Thu, 28 Jul 2016 20:51:20 +0100] rev 30411
worker: wait worker pid explicitly Before this patch, waitforworkers uses os.wait() to collect child workers, and only wait len(pids) processes. This can have serious issues if other code spawns new processes and does not reap them: 1. worker.py may get wrong exit code and kill innocent workers. 2. worker.py may continue without waiting for all workers to complete. This patch fixes the issue by using waitpid to wait worker pid explicitly. However, this patch introduces a new issue: worker failure may not be handled immediately. The issue will be addressed in next patches.
Thu, 28 Jul 2016 20:49:57 +0100 worker: move killworkers and waitforworkers up
Jun Wu <quark@fb.com> [Thu, 28 Jul 2016 20:49:57 +0100] rev 30410
worker: move killworkers and waitforworkers up We need to use them in the SIGCHLD handler and SIGCHLD handler should be installed before fork.
Fri, 11 Nov 2016 21:11:17 +0000 osutil: implement setprocname to set process title for some platforms
Jun Wu <quark@fb.com> [Fri, 11 Nov 2016 21:11:17 +0000] rev 30409
osutil: implement setprocname to set process title for some platforms This patch adds a simple setprocname method to osutil. The operation is not defined by any standard and is platform-specific, the current implementation tries to cover some major platforms (ex. Linux, OS X, FreeBSD) that is relatively easy to support. Other platforms (Windows [4], other BSDs, ...) can be added in the future. The current implementation supports two methods to change process title: a. setproctitle if available (works in FreeBSD). b. rewrite argv in place (works in Linux [1] and Mac OS X). [2] [3] [1]: Linux has "prctl(PR_SET_NAME, ...)" but 1) it has 16-byte limit, which is too small; 2) it is not quite equivalent to what we want - it changes "/proc/self/comm", not "/proc/self/cmdline" - "comm" change won't show up in "ps" output unless "-o comm" is used. [2]: The implementation does not rewrite the **environ buffer like some other implementations do, just to make the code simpler and safer. However, this also means the buffer size we can rewrite is significantly shorter. If we are really greedy and want the "environ" space, we can change the implementation later. [3]: It requires a CPython private API: Py_GetArgcArgv to get the original argv. Unfortunately Python 3 makes a copy of argv and returns the wchar_t version, so it is not supported for now. (if we really want to, we could count backwards from "char **environ", given known argc and argv, not sure if that's a good idea - probably not) [4]: The feature is aimed to make it easier for forked command server processes to show what they are doing. Since Windows does not support fork(), despite it's a major platform, its support is not added in this patch.
Fri, 11 Nov 2016 20:45:40 +0000 setup: test setproctitle before building osutil
Jun Wu <quark@fb.com> [Fri, 11 Nov 2016 20:45:40 +0000] rev 30408
setup: test setproctitle before building osutil We are going to use setproctitle (provided by FreeBSD) if it's available in the next patch. Therefore provide a macro to give some clues to the C pre-processor so it could choose code path wisely.
Sat, 12 Nov 2016 13:36:17 +0100 patch: remove unused git parameter from patch.diffstat()
Henning Schild <henning@hennsch.de> [Sat, 12 Nov 2016 13:36:17 +0100] rev 30407
patch: remove unused git parameter from patch.diffstat() Since 628a4a9e411d the parameter is not used anymore.
Thu, 29 Sep 2016 10:16:34 +0200 perf: add asv benchmarks
Philippe Pepiot <philippe.pepiot@logilab.fr> [Thu, 29 Sep 2016 10:16:34 +0200] rev 30406
perf: add asv benchmarks Airspeed velocity (ASV) is a python framework for benchmarking Python packages over their lifetime. The results are displayed in an interactive web frontend. Add ASV benchmarks for mercurial that use contrib/perf.py extension that could be run against multiple reference repositories. The benchmark suite now includes revsets from contrib/base-revsets.txt with variants, perftags, perfstatus, perfmanifest and perfheads. Installation requires asv>=0.2, python-hglib and virtualenv This is part of PerformanceTrackingSuitePlan https://www.mercurial-scm.org/wiki/PerformanceTrackingSuitePlan
Tue, 15 Nov 2016 16:10:57 +0100 perf: omit copying ui and redirect to ferr if buffer API is in use
Philippe Pepiot <philippe.pepiot@logilab.fr> [Tue, 15 Nov 2016 16:10:57 +0100] rev 30405
perf: omit copying ui and redirect to ferr if buffer API is in use This allow to get the output of contrib/perf.py commands using the ui.pushbuffer() API.
Mon, 14 Nov 2016 15:24:07 -0800 manifest: change treemanifestctx to construct subtrees from the manifestlog
Durham Goode <durham@fb.com> [Mon, 14 Nov 2016 15:24:07 -0800] rev 30404
manifest: change treemanifestctx to construct subtrees from the manifestlog Previously, treemanifestctx would directly construct its subtrees. By making it get the subtrees through manifestlog.get() we consolidate all treemanifestctx creation into manifestlog.get() and therefore extensions that need to wrap manifestctx creation (like narrow-hg) can intercept manifestctxs at that single place. This also means fetching subtrees will take advantage of the manifestlog ctx cache now, which it did not before.
Mon, 14 Nov 2016 15:17:27 -0800 manifest: make revlog verification optional
Durham Goode <durham@fb.com> [Mon, 14 Nov 2016 15:17:27 -0800] rev 30403
manifest: make revlog verification optional This patches adds an parameter to manifestlog.get() to disable hash checking. This will be used in an upcoming patch to support treemanifestctx reading sub-trees without loading them from the revlog. (This is already supported but does not go through the manifestlog.get() code path)
Thu, 10 Nov 2016 09:45:42 -0800 debugcommands: move debugbuilddag
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 10 Nov 2016 09:45:42 -0800] rev 30402
debugcommands: move debugbuilddag And we drop some now unused imports from commands.py.
Wed, 17 Aug 2016 21:07:38 -0700 debugcommands: introduce standalone module for debug commands
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 17 Aug 2016 21:07:38 -0700] rev 30401
debugcommands: introduce standalone module for debug commands commands.py is our largest .py file by nearly 2x. Debug commands live in a world of their own. So let's extract them to their own module. We start with "debugancestor." We currently reuse the commands table with commands.py and have a hack in dispatch.py for loading debugcommands.py. In the future, we could potentially use a separate commands table and avoid the import of debugcommands.py.
Mon, 14 Nov 2016 23:17:15 +0000 convert: migrate to util.iterfile
Jun Wu <quark@fb.com> [Mon, 14 Nov 2016 23:17:15 +0000] rev 30400
convert: migrate to util.iterfile
Mon, 14 Nov 2016 23:16:05 +0000 match: migrate to util.iterfile
Jun Wu <quark@fb.com> [Mon, 14 Nov 2016 23:16:05 +0000] rev 30399
match: migrate to util.iterfile
Mon, 14 Nov 2016 23:15:01 +0000 store: migrate to util.iterfile
Jun Wu <quark@fb.com> [Mon, 14 Nov 2016 23:15:01 +0000] rev 30398
store: migrate to util.iterfile
Mon, 14 Nov 2016 23:14:06 +0000 patch: migrate to util.iterfile
Jun Wu <quark@fb.com> [Mon, 14 Nov 2016 23:14:06 +0000] rev 30397
patch: migrate to util.iterfile
Mon, 14 Nov 2016 23:12:11 +0000 worker: migrate to util.iterfile
Jun Wu <quark@fb.com> [Mon, 14 Nov 2016 23:12:11 +0000] rev 30396
worker: migrate to util.iterfile
Mon, 14 Nov 2016 23:32:54 +0000 util: add iterfile to workaround a fileobj.__iter__ issue with EINTR
Jun Wu <quark@fb.com> [Mon, 14 Nov 2016 23:32:54 +0000] rev 30395
util: add iterfile to workaround a fileobj.__iter__ issue with EINTR The fileobj.__iter__ implementation in Python 2.7.12 (hg changeset 45d4cea97b04) is buggy: it cannot handle EINTR correctly. In Objects/fileobject.c: size_t Py_UniversalNewlineFread(....) { .... if (!f->f_univ_newline) return fread(buf, 1, n, stream); .... } According to the "fread" man page: If an error occurs, or the end of the file is reached, the return value is a short item count (or zero). Therefore it's possible for "fread" (and "Py_UniversalNewlineFread") to return a positive value while errno is set to EINTR and ferror(stream) changes from zero to non-zero. There are multiple "Py_UniversalNewlineFread": "file_read", "file_readinto", "file_readlines", "readahead". While the first 3 have code to handle the EINTR case, the last one "readahead" doesn't: static int readahead(PyFileObject *f, Py_ssize_t bufsize) { .... chunksize = Py_UniversalNewlineFread( f->f_buf, bufsize, f->f_fp, (PyObject *)f); .... if (chunksize == 0) { if (ferror(f->f_fp)) { PyErr_SetFromErrno(PyExc_IOError); .... } } .... } It means "readahead" could ignore EINTR, if "Py_UniversalNewlineFread" returns a non-zero value. And at the next time "readahead" got executed, if "Py_UniversalNewlineFread" returns 0, "readahead" would raise a Python error without a incorrect errno - could be 0 - thus "IOError: [Errno 0] Error". The only user of "readahead" is "readahead_get_line_skip". The only user of "readahead_get_line_skip" is "file_iternext", aka. "fileobj.__iter__", which should be avoided. There are multiple places where the pattern "for x in fp" is used. This patch adds a "iterfile" method in "util.py" so we can migrate our code from "for x in fp" to "fox x in util.iterfile(fp)".
Thu, 10 Nov 2016 16:37:18 -0500 filterpyflakes: whitelist listcomp aliasing checking
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:37:18 -0500] rev 30394
filterpyflakes: whitelist listcomp aliasing checking The test change is because of how filterpyflakes is organized - a line number changed.
Thu, 10 Nov 2016 16:35:54 -0500 verify: avoid shadowing two variables with a list comprehension
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:35:54 -0500] rev 30393
verify: avoid shadowing two variables with a list comprehension The variable names are clearly worse now, but since we're really just transposing key and value I'm not too worried about the clarity loss.
Thu, 10 Nov 2016 16:35:10 -0500 revset: avoid shadowing a variable with a list comprehension
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:35:10 -0500] rev 30392
revset: avoid shadowing a variable with a list comprehension
Thu, 10 Nov 2016 16:34:43 -0500 revlog: avoid shadowing several variables using list comprehensions
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:34:43 -0500] rev 30391
revlog: avoid shadowing several variables using list comprehensions
Thu, 10 Nov 2016 16:33:41 -0500 minirst: avoid shadowing a variable in a list comprehension
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:33:41 -0500] rev 30390
minirst: avoid shadowing a variable in a list comprehension
Thu, 10 Nov 2016 16:33:23 -0500 hbisect: avoid shadowing a variable in a list comprehension
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:33:23 -0500] rev 30389
hbisect: avoid shadowing a variable in a list comprehension
Thu, 10 Nov 2016 16:33:07 -0500 filemerge: avoid shadowing a variable in a list comprehension
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:33:07 -0500] rev 30388
filemerge: avoid shadowing a variable in a list comprehension
Thu, 10 Nov 2016 16:32:51 -0500 color: avoid shadowing a variable inside a list comprehension
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:32:51 -0500] rev 30387
color: avoid shadowing a variable inside a list comprehension
Thu, 10 Nov 2016 16:32:38 -0500 memory: avoid shadowing variables inside a list comprehension
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 16:32:38 -0500] rev 30386
memory: avoid shadowing variables inside a list comprehension
Thu, 10 Nov 2016 03:15:41 -0800 shelve: move shelve-finishing logic to a separate function
Kostia Balytskyi <ikostia@fb.com> [Thu, 10 Nov 2016 03:15:41 -0800] rev 30385
shelve: move shelve-finishing logic to a separate function With future obs-based shelve, finishing shelve will be different from just aborting a transaction and I would like to keep both variants of this functionality in a separate function.
Thu, 10 Nov 2016 03:20:28 -0800 shelve: move unknown files handling to a separate function
Kostia Balytskyi <ikostia@fb.com> [Thu, 10 Nov 2016 03:20:28 -0800] rev 30384
shelve: move unknown files handling to a separate function This change has nothing to do with future obsshelve introduction, it is done just for readability purposes.
Thu, 10 Nov 2016 03:07:20 -0800 shelve: move actual created commit shelving to a separate function
Kostia Balytskyi <ikostia@fb.com> [Thu, 10 Nov 2016 03:07:20 -0800] rev 30383
shelve: move actual created commit shelving to a separate function Currently, this code does not have any branching, it just bundles a commit and saves a patch file. Later, obsolescence-based shelve will be added, so this code will also create some obsmarkers and will be one of the few places where obsshelve will be different from traditional shelve.
Thu, 10 Nov 2016 03:33:01 -0800 shelve: move 'nothing changed' messaging to a separate function
Kostia Balytskyi <ikostia@fb.com> [Thu, 10 Nov 2016 03:33:01 -0800] rev 30382
shelve: move 'nothing changed' messaging to a separate function This has nothing to do with the future obsshelve implementation, I just thought that moving this messaging to a separate function will improve shelve code readability.
Thu, 10 Nov 2016 03:26:31 -0800 shelve: move commitfunc creation to a separate function
Kostia Balytskyi <ikostia@fb.com> [Thu, 10 Nov 2016 03:26:31 -0800] rev 30381
shelve: move commitfunc creation to a separate function Special commitfuncs are created as closures at least twice in shelve's code and one time special commitfunc is used within another closure. They all serve very specific purposes like temporarily tweak some configuration or enable editor, etc. This is not immediately important to someone reading shelve code, so I think moving this logic to a separate function is a good idea.
Thu, 10 Nov 2016 03:24:07 -0800 shelve: move mutableancestors to not be a closure
Kostia Balytskyi <ikostia@fb.com> [Thu, 10 Nov 2016 03:24:07 -0800] rev 30380
shelve: move mutableancestors to not be a closure There's no value in it being a closure and everyone who tries to read the outer function code will be distracted by it. IMO moving it out significantly improves readability, especially given how clear it is what mutableancestors function does from its name.
Thu, 10 Nov 2016 03:22:55 -0800 shelve: move shelve name generation to a separate function
Kostia Balytskyi <ikostia@fb.com> [Thu, 10 Nov 2016 03:22:55 -0800] rev 30379
shelve: move shelve name generation to a separate function This has nothing to do with future obsshelve introduction, done just for readability purposes.
Thu, 10 Nov 2016 03:07:20 -0800 shelve: move possible shelve file extensions to a single place
Kostia Balytskyi <ikostia@fb.com> [Thu, 10 Nov 2016 03:07:20 -0800] rev 30378
shelve: move possible shelve file extensions to a single place This and a couple of following patches are a preparation to implementing obsolescense-enabled shelve which was discussed on a Sprint. If this refactoring is not done, shelve is going to look even more hackish than now. This particular commit introduces a slight behavior change. Previously, if only .hg/shelve/name.patch file exists, but .hg/name.hg does not, 'hg shelve -d name' would fail saying "shelve not found". Now deletion will only fail if .patch file does not exist (since .patch is used as an indicator of an existing shelve). Other shelve files being absent are skipped silently to accommodate for future introduction of obs-based shelve, which will mean that for some shelves .hg and .patch files exist, while for others .hg and .oshelve.
Thu, 10 Nov 2016 02:13:19 -0800 manifest: delete manifest.manifest class
Durham Goode <durham@fb.com> [Thu, 10 Nov 2016 02:13:19 -0800] rev 30377
manifest: delete manifest.manifest class Now that nothing uses the primary manifest class, we can delete it.
Thu, 10 Nov 2016 02:13:19 -0800 localrepo: delete localrepo.manifest
Durham Goode <durham@fb.com> [Thu, 10 Nov 2016 02:13:19 -0800] rev 30376
localrepo: delete localrepo.manifest Now that nothing uses normal manifests, we can delete localrepo.manifest.
Thu, 10 Nov 2016 02:13:19 -0800 manifest: remove last uses of repo.manifest
Durham Goode <durham@fb.com> [Thu, 10 Nov 2016 02:13:19 -0800] rev 30375
manifest: remove last uses of repo.manifest Now that all the functionality has been moved to manifestlog/manifestrevlog/etc, we can finally change all the uses of repo.manifest to use the new versions. A future diff will then delete repo.manifest. One additional change in this commit is to change repo.manifestlog to be a @storecache property instead of @property. This is required by some uses of repo.manifest require that it be settable (contrib/perf.py and the static http server). We can't do this in a prior change because we can't use @storecache on this until repo.manifest is no longer used anywhere.
Fri, 11 Nov 2016 01:20:13 -0800 manifest: add unionmanifestlog support
Durham Goode <durham@fb.com> [Fri, 11 Nov 2016 01:20:13 -0800] rev 30374
manifest: add unionmanifestlog support As part of deprecating manifest, we need to make the union repo support manifestlog.
Fri, 11 Nov 2016 01:15:59 -0800 manifest: add bundlemanifestlog support
Durham Goode <durham@fb.com> [Fri, 11 Nov 2016 01:15:59 -0800] rev 30373
manifest: add bundlemanifestlog support As part of deprecating manifest.manifest we need to make bundlerepo support manifestlog.
Thu, 10 Nov 2016 02:13:19 -0800 manifest: make manifestlog use it's own cache
Durham Goode <durham@fb.com> [Thu, 10 Nov 2016 02:13:19 -0800] rev 30372
manifest: make manifestlog use it's own cache As we start to make manifestlog the primary manifest source, the dependency on manifest.manifest will cause circular dependency problems. Let's break this dependency by making manifestlog use it's own cache. In a near future patch we will remove the previous manifest cache so we're not duplicating it.
Thu, 10 Nov 2016 02:13:19 -0800 manifest: delete unused dirlog and _newmanifest functions
Durham Goode <durham@fb.com> [Thu, 10 Nov 2016 02:13:19 -0800] rev 30371
manifest: delete unused dirlog and _newmanifest functions As part of migrating all manifest functionality out of manifest.manifest, let's migrate a couple spots off of manifest.dirlog() to use the revlog specific accessor. Then we can delete manifest.dirlog() and other unused functions.
Thu, 10 Nov 2016 02:13:19 -0800 manifest: move clearcaches to manifestlog
Durham Goode <durham@fb.com> [Thu, 10 Nov 2016 02:13:19 -0800] rev 30370
manifest: move clearcaches to manifestlog This is part of removing all functionality from manifest.manifest so we can delete the class entirely.
Thu, 10 Nov 2016 02:13:19 -0800 manifest: remove usages of manifest.read
Durham Goode <durham@fb.com> [Thu, 10 Nov 2016 02:13:19 -0800] rev 30369
manifest: remove usages of manifest.read Now that the two manifestctx implementations have working read() functions, let's remove the existing uses of manifest.read and drop the function.
Thu, 10 Nov 2016 02:13:19 -0800 manifest: remove dependency on manifestrevlog being able to create trees
Durham Goode <durham@fb.com> [Thu, 10 Nov 2016 02:13:19 -0800] rev 30368
manifest: remove dependency on manifestrevlog being able to create trees A future patch will be removing the read() function from the manifest class. Since manifestrevlog currently depends on the read function that manifest implements (as a derived class), we need to break the dependency from manifestrevlog to read(). We do this by adding an argument to manifestrevlog.write() which provides it with the ability to read a manifest. This is good in general because it further separates revlog as the storage format from the actual inmemory data structure implementation.
Fri, 11 Nov 2016 13:06:05 +1100 color: show mode warning based on ui.formatted
Xidorn Quan <me@upsuper.org> [Fri, 11 Nov 2016 13:06:05 +1100] rev 30367
color: show mode warning based on ui.formatted ui.interactive is only for input and ui.formatted is for output.
Thu, 10 Nov 2016 15:14:05 -0500 protocol: drop unused import of zlib
Augie Fackler <augie@google.com> [Thu, 10 Nov 2016 15:14:05 -0500] rev 30366
protocol: drop unused import of zlib Something weird is happening that breaks pyflakes installed via 'pip install --user'. I haven't had a chance to finish debugging this, but this at least fixes the build.
Tue, 08 Nov 2016 22:41:45 +0900 hook: lower inflated use of sys.__stdout__ and __stderr__
Yuya Nishihara <yuya@tcha.org> [Tue, 08 Nov 2016 22:41:45 +0900] rev 30365
hook: lower inflated use of sys.__stdout__ and __stderr__ They were introduced at 9f76df0edb7d, where sys.stdout could be replaced by sys.stderr. After that, we've changed the way of stdout redirection by afccc64eea73, so we no longer need to reference the original __stdout__ and __stderr__ objects. Let's move away from using __std*__ objects so we can simply wrap sys.std* objects for Python 3 porting.
Tue, 08 Nov 2016 22:22:22 +0900 hook: flush stdout before restoring stderr redirection
Yuya Nishihara <yuya@tcha.org> [Tue, 08 Nov 2016 22:22:22 +0900] rev 30364
hook: flush stdout before restoring stderr redirection There was a similar issue to 8b011ededfb2. If an in-process hook writes to stdout, the data may be buffered. In which case, stdout must be flushed before restoring its file descriptor. Otherwise, remaining data would be sent over the ssh wire and corrupts the protocol. Note that this is a different redirection from the one I've just removed.
Thu, 20 Oct 2016 22:39:59 +0900 hook: do not redirect stdout/err/in to ui while running in-process hooks (BC)
Yuya Nishihara <yuya@tcha.org> [Thu, 20 Oct 2016 22:39:59 +0900] rev 30363
hook: do not redirect stdout/err/in to ui while running in-process hooks (BC) It was introduced by a59058fd074a to address command-server issues. After that, I've made a complete fix by 69f86b937035, so we don't need to replace sys.stdio objects to protect the IPC channels. This change means we no longer see data written to sys.stdout/err by an in-process hook on command server. I think that's okay because the canonical way is to use ui functions and in-process hooks should respect the Mercurial API. This will help Python 3 porting, where sys.stdout is TextIO but ui.fout is BytesIO.
Thu, 10 Nov 2016 02:21:15 -0800 merge: change modified indicator to be 20 bytes
Durham Goode <durham@fb.com> [Thu, 10 Nov 2016 02:21:15 -0800] rev 30362
merge: change modified indicator to be 20 bytes Previously we indicated that the .hgsubstate file was dirty by adding a '+' to the end of its hash in the wctx manifest. This made is complicated to have new manifest implementations that rely on the node length being fixed. In previous patches we added added and modified node placeholders, so let's use those to indicate dirty here as well. It doesn't look like anything ever depended on this '+' (aside from it being different to the parent), so nothing else needed to change here.
Thu, 10 Nov 2016 02:19:16 -0800 dirstate: change added/modified placeholder hash length to 20 bytes
Durham Goode <durham@fb.com> [Thu, 10 Nov 2016 02:19:16 -0800] rev 30361
dirstate: change added/modified placeholder hash length to 20 bytes Previously the added/modified placeholder hash for manifests generated from the dirstate was a 21byte long string consisting of the p1 file hash plus a single character to indicate an add or a modify. Normal hashes are only 20 bytes long. This makes it complicated to implement more efficient manifest implementations which rely on the hashes being fixed length. Let's change this hash to just be 20 bytes long, and rely on the astronomical improbability of an actual hash being these 20 bytes (just like we rely on no hash every being the nullid). This changes the possible behavior slightly in that the hash for all added/modified entries in the dirstate manifest will now be the same (so simple node comparisons would say they are equal), but we should never be doing simple node comparisons on these nodes even with the old hashes, because they did not accurately represent the content (i.e. two files based off the same p1 file node, with different working copy contents would have the same hash (even with the appended character) in the old scheme too, so we couldn't depend on the hashes period).
Thu, 10 Nov 2016 02:17:22 -0800 dirstate: change placeholder hash length to 20 bytes
Durham Goode <durham@fb.com> [Thu, 10 Nov 2016 02:17:22 -0800] rev 30360
dirstate: change placeholder hash length to 20 bytes Previously the new-node placeholder hash for manifests generated from the dirstate was a 21byte long string of "!" characters. Normal hashes are only 20 bytes long. This makes it complicated to implement more efficient manifest implementations which rely on the hashes being fixed length. Let's change this hash to just be 20 bytes long, and rely on the astronomical improbability of an actual hash being 20 "!" bytes in a row (just like we rely on no hash ever being the nullid). A future diff will do this for added and modified dirstate markers as well, so we're putting the new newnodeid in node.py so there's a common place for these placeholders.
Mon, 07 Nov 2016 18:57:54 -0800 util: remove compressorobj API from compression engines
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 07 Nov 2016 18:57:54 -0800] rev 30359
util: remove compressorobj API from compression engines All callers have been replaced with "compressstream." It is quite low-level and redundant with "compressstream." So eliminate it.
Mon, 07 Nov 2016 18:54:35 -0800 hgweb: use compression engine API for zlib compression
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 07 Nov 2016 18:54:35 -0800] rev 30358
hgweb: use compression engine API for zlib compression More low-level compression code elimination because we now have nice APIs. This patch also demonstrates why we needed and implemented the "level" option on the "compressstream" API.
Mon, 07 Nov 2016 18:46:37 -0800 bundle2: use compressstream compression engine API
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 07 Nov 2016 18:46:37 -0800] rev 30357
bundle2: use compressstream compression engine API Compression engines now have an API for compressing a stream of chunks. Switch to it and make low-level compression code disappear.
Mon, 07 Nov 2016 18:57:07 -0800 util: add a stream compression API to compression engines
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 07 Nov 2016 18:57:07 -0800] rev 30356
util: add a stream compression API to compression engines It is a common pattern throughout the code to perform compression on an iterator of chunks, yielding an iterator of compressed chunks. Let's formalize that as part of the compression engine API. The zlib and bzip2 implementations allow an optional "level" option to control the compression level. The default values are the same as what the Python modules use. This option will be used in subsequent patches.
(0) -30000 -10000 -3000 -1000 -120 +120 +1000 +3000 +10000 tip