mercurial: Changelog

help: teach loaddoc to load from a different directory The help system currently only supports showing help topics from a single directory. We'll need to teach it to show results from different directories in order to show the internals topics. The first step is to teach loaddoc() to load documentation from a sub-directory.

setup.py: package internals help files mpm says internal docs should be visible via `hg help` and hgweb. They need to be in the distribution for this to work. Package them.

help: add documentation for bundle types Bundle types and the high-level data format of each bundle isn't documented anywhere. Let's document this as well. Obviously there are many more details about bundles that could be written about. But you have to start somewhere.

help: add documentation for changegroup formats There is no formal location for spec-like technical/internal docs. The repository makes sense as such a location because spec-like documentation should be reviewed (ruling out a wiki). mpm has also stated that he would like this documentation to be part of the built-in help system. So, we establish an "internals" sub-directory to hold this class of documentation. The format of changegroups does not appear to be documented anywhere, even in source code. It therefore seemed like an appropriate first thing to document. This patch adds low-level documentation of versions 1 and 2 of the changegroup foromat. It currently only describes the raw data format. There is probably room to write higher-level documentation on strategies for producing and consuming the data. We'll leave that for another day. The added file is not yet accessible via `hg help` nor via hgweb. Support for this will follow in subsequent patches.

util: reimplement lrucachedict As part of attempting to more aggressively use the existing lrucachedict, collections.deque operations were frequently showing up in profiling output, negating benefits of caching. Searching the internet seems to tell me that the most efficient way to implement an LRU cache in Python is to have a dict indexing the cached entries and then to use a doubly linked list to track freshness of each entry. So, this patch replaces our existing lrucachedict with a version using such a pattern. The recently introduced perflrucachedict command reveals the following timings for 10,000 operations for the following cache sizes for the existing cache: n=4 init=0.004079 gets=0.003632 sets=0.005188 mixed=0.005402 n=8 init=0.004045 gets=0.003998 sets=0.005064 mixed=0.005328 n=16 init=0.004011 gets=0.004496 sets=0.005021 mixed=0.005555 n=32 init=0.004064 gets=0.005611 sets=0.005188 mixed=0.006189 n=64 init=0.003975 gets=0.007684 sets=0.005178 mixed=0.007245 n=128 init=0.004121 gets=0.012005 sets=0.005422 mixed=0.009471 n=256 init=0.004143 gets=0.020295 sets=0.005227 mixed=0.013612 n=512 init=0.004039 gets=0.036703 sets=0.005243 mixed=0.020685 n=1024 init=0.004193 gets=0.068142 sets=0.005251 mixed=0.033064 n=2048 init=0.004070 gets=0.133383 sets=0.005160 mixed=0.050359 n=4096 init=0.004053 gets=0.265194 sets=0.004868 mixed=0.048352 n=8192 init=0.004087 gets=0.542218 sets=0.004562 mixed=0.032753 n=16384 init=0.004106 gets=1.064055 sets=0.004179 mixed=0.020367 n=32768 init=0.004034 gets=2.097620 sets=0.004260 mixed=0.013031 n=65536 init=0.004108 gets=4.106390 sets=0.004268 mixed=0.010191 As the data shows, the existing cache's retrieval performance diminishes linearly with cache size. (Keep in mind the microbenchmark is testing 100% cache hit rate.) The new cache implementation reveals the following: n=4 init=0.006665 gets=0.006541 sets=0.005733 mixed=0.006876 n=8 init=0.006649 gets=0.006374 sets=0.005663 mixed=0.006899 n=16 init=0.006570 gets=0.006504 sets=0.005799 mixed=0.007057 n=32 init=0.006854 gets=0.006459 sets=0.005747 mixed=0.007034 n=64 init=0.006580 gets=0.006495 sets=0.005740 mixed=0.006992 n=128 init=0.006534 gets=0.006739 sets=0.005648 mixed=0.007124 n=256 init=0.006669 gets=0.006773 sets=0.005824 mixed=0.007151 n=512 init=0.006701 gets=0.007061 sets=0.006042 mixed=0.007372 n=1024 init=0.006641 gets=0.007620 sets=0.006387 mixed=0.007464 n=2048 init=0.006517 gets=0.008598 sets=0.006871 mixed=0.008077 n=4096 init=0.006720 gets=0.010933 sets=0.007854 mixed=0.008663 n=8192 init=0.007383 gets=0.015969 sets=0.010288 mixed=0.008896 n=16384 init=0.006660 gets=0.025447 sets=0.011208 mixed=0.008826 n=32768 init=0.006658 gets=0.044390 sets=0.011192 mixed=0.008943 n=65536 init=0.006836 gets=0.082736 sets=0.011151 mixed=0.008826 Let's go through the results. The new cache takes longer to construct. ~6.6ms vs ~4.1ms. However, this is measuring 10,000 __init__ calls, so the difference is ~0.2us/instance. We currently only create lrucachedict for manifest instances, so this regression is not likely relevant. The new cache is slightly slower for retrievals for cache sizes < 1024. It's worth noting that the only existing use of lurcachedict is in manifest.py and the default cache size is 4. This regression is worrisome. However, for n=4, the delta is ~2.9s for 10,000 lookups, or ~0.29us/op. Again, this is a marginal regression and likely not relevant in the real world. Timing `hg log -p -l 100` for mozilla-central reveals that cache lookup times are dominated by decompression and fulltext resolution (even with lz4 manifests). The new cache is significantly faster for retrievals at larger capacities. Whereas the old implementation has retrieval performance linear with cache capacity, the new cache is constant time until much larger values. And, when it does start to increase significantly, it is a few magnitudes faster than the current cache. The new cache does appear to be slower for sets when capacity is large. However, performance is similar for smaller capacities. Of course, caches should generally be optimized for retrieval performance because if a cache is getting more sets than gets, it doesn't really make sense to cache. If this regression is worrisome, again, taking the largest regression at n=65536 of ~6.9ms for 10,000 results in a regression of ~0.68us/op. This is not significant in the grand scheme of things. Overall, the new cache is performant at retrievals at much larger capacity values which makes it a generally more useful cache backend. While there are regressions, their absolute value is extremely small. Since we aren't using lrucachedict aggressively today, these regressions should not be relevant. The improved scalability of lrucachedict should enable us to more aggressively utilize lrucachedict for more granular caching (read: higher capacity caches) in the near future. The impetus for this patch is to establish a cache of decompressed revlog revisions, notably manifest revisions. And since delta chains can grow to >10,000 and cache hit rate can be high, the improved retrieval performance of lrucachedict should be relevant.

record: don't dereference symlinks while copying over stat data Previously, we could be calling os.utime or os.chflags (via shutil.copystat) on a symlink. These functions dereference symlinks, so this would have caused the timestamp of the target to be set. On a read-only or similarly weird filesystem, this might cause an exception to be raised. This is pretty hard to test because conjuring up a read-only filesystem for test purposes is non-trivial.

copyfile: add an optional parameter to copy other stat data Contrary to the comment, I didn't see any evidence that we were copying atime/mtime at all. This adds a parameter to copyfile to optionally copy it and other stat data, with the default being to not copy it. Many systems don't support changing the timestamp of a symlink, but we don't need that in general anyway -- copystat is mostly useful for editors, most of which will dereference symlinks anyway.

tests: move the '-hg' postfix for all style tests We had them on 'test-check-code-hg.t' to avoid collision with the test checking 'check-code' itself. Now that this one have been rename, we can safely remove this suffix for all of them. This get them in line with 'check-pyflakes.t'.

test: rename 'check-code' own test to 'test-contrib-check-code.t' This test (making sure the 'check-code' script run as intended) have been confused with the test making that the mercurial code base comply with our coding still by multiple generations of contributors. We are moving it out of the way so that all tests starting with 'test-check' are now doing compliance testing.

parsers: add a missed PyErr_NoMemory

parsers: check results of PyInt_FromLong (issue4771)

parsers: simplify error logic in compute_phases_map_sets Since Py_XDECREF and free both accept NULL pointers, we can get by with just two exit paths: one for success, and one for error. This considerably simplifies reasoning about the possible ways to exit from this function.

util: rename argument of isatty() In general, "fd" is a file descriptor, but isatty() expects a file object. We should call it "fp" or "fh".

posix: remove unixdomainserver class It's no longer used since the removal of the inotify extension.

revlog: use absolute_import

windows: use absolute_import

similar: use absolute_import

util: use absolute_import

util: make hashlib import unconditional hashlib was added in Python 2.5. As far as I can tell, SHA-512 is always available in 2.6+. So move the hashlib import to the top of the file and remove the one-off handling of SHA-512.

encoding: use double backslash In Python 2, '\u' == '\\u'. However, in Python 3, '\u' results in: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape The minor change in this patch allows Python 3 to ast parse encoding.py.

encoding: use absolute_import

hg: establish function for performing post-share actions As part of writing an extension that wished to share an arbitrary piece of data among shared repos, I had to reimplement a significant part of hg.share in order to obtain localrepository instances for the source and destination. This patch establishes a function in hg.py that will be called after a share is performed. It is passed localrepository instances so extensions can easily perform additional actions at share time. We move hgrc and shared file writing there because this function is a logical place for it. A side effect of the refactor is writing of the shared file now occurs before updating. This seems more appropriate and shouldn't have any impact on real world behavior.

share: pass named arguments They are defined as named arguments and previous called as positional arguments. As part of wrapping hg.share in an extension, I had to extract arguments using some hacky techniques. Using named arguments makes wrapping much simpler.

commandserver: cut import cycle by itself We generally make modules importable from the front-end layer, dispatch -> commands -> x. So the import cycle to dispatch should be resolved by the commandserver module.

commandserver: use absolute_import

tests: histedit-helpers fixbundle should not complain about no input

tests: relax histedit issue4251 and issue3893 backups I'm globbing these because some are globbed, and this pair gets in the way of the main parts of the series.

setup.py: use bytes literals The b() helper was needed because Python < 2.6 didn't support bytes literals (b''). Now that we don't support Python < 2.6, we no longer need this helper.

clonebundles: fix typo

merge: rework manifestmerge to use a matcher This opens the door to working slightly more closely with the manifest type and letting it optimize out some of the diff comparisons for us, and also makes life significantly easier for narrowhg.