update: filter the ambiguous mtime in update directly
authorPierre-Yves David <pierre-yves.david@octobus.net>
Wed, 17 Nov 2021 10:22:15 +0100
changeset 48380 6dc3dc5e9636
parent 48379 08b060abd658
child 48381 41f40f35278a
update: filter the ambiguous mtime in update directly Right now, this filtering is done by `dirstate.write` using the time of `dirstate.write` method call. However that filtering is done "too late" It works "fine" as most command are "fast enough", and race rare enough. We are about to change the mtime filtering logic in the dirstate to be more accurate and reliable. However `hg update` will still need such filtering (mostly because it is actually quite racy, even with the existing filtering). So we explicitly implement a similar logic here. Before removing the older one later in the series. Differential Revision: https://phab.mercurial-scm.org/D11784
mercurial/merge.py
--- a/mercurial/merge.py	Wed Nov 17 12:24:00 2021 +0100
+++ b/mercurial/merge.py	Wed Nov 17 10:22:15 2021 +0100
@@ -2172,6 +2172,71 @@
                 mresult.len((mergestatemod.ACTION_GET,)) if wantfiledata else 0
             )
             with repo.dirstate.parentchange():
+                ### Filter Filedata
+                #
+                # We gathered "cache" information for the clean file while
+                # updating them: mtime, size and mode.
+                #
+                # At the time this comment is written, they are various issues
+                # with how we gather the `mode` and `mtime` information (see
+                # the comment in `batchget`).
+                #
+                # We are going to smooth one of this issue here : mtime ambiguity.
+                #
+                # i.e. even if the mtime gathered during `batchget` was
+                # correct[1] a change happening right after it could change the
+                # content while keeping the same mtime[2].
+                #
+                # When we reach the current code, the "on disk" part of the
+                # update operation is finished. We still assume that no other
+                # process raced that "on disk" part, but we want to at least
+                # prevent later file change to alter the content of the file
+                # right after the update operation. So quickly that the same
+                # mtime is record for the operation.
+                # To prevent such ambiguity to happens, we will only keep the
+                # "file data" for files with mtime that are stricly in the past,
+                # i.e. whose mtime is strictly lower than the current time.
+                #
+                # This protect us from race conditions from operation that could
+                # run right after this one, especially other Mercurial
+                # operation that could be waiting for the wlock to touch files
+                # content and the dirstate.
+                #
+                # In an ideal world, we could only get reliable information in
+                # `getfiledata` (from `getbatch`), however the current approach
+                # have been a successful compromise since many years.
+                #
+                # At the time this comment is written, not using any "cache"
+                # file data at all here would not be viable. As it would result is
+                # a very large amount of work (equivalent to the previous `hg
+                # update` during the next status after an update).
+                #
+                # [1] the current code cannot grantee that the `mtime` and
+                # `mode` are correct, but the result is "okay in practice".
+                # (see the comment in `batchget`).                #
+                #
+                # [2] using nano-second precision can greatly help here because
+                # it makes the "different write with same mtime" issue
+                # virtually vanish. However, dirstate v1 cannot store such
+                # precision and a bunch of python-runtime, operating-system and
+                # filesystem does not provide use with such precision, so we
+                # have to operate as if it wasn't available.
+                if getfiledata:
+                    ambiguous_mtime = {}
+                    now = timestamp.get_fs_now(repo.vfs)
+                    if now is None:
+                        # we can't write to the FS, so we won't actually update
+                        # the dirstate content anyway, no need to put cache
+                        # information.
+                        getfiledata = None
+                    else:
+                        now_sec = now[0]
+                        for f, m in pycompat.iteritems(getfiledata):
+                            if m is not None and m[2][0] >= now_sec:
+                                ambiguous_mtime[f] = (m[0], m[1], None)
+                        for f, m in pycompat.iteritems(ambiguous_mtime):
+                            getfiledata[f] = m
+
                 repo.setparents(fp1, fp2)
                 mergestatemod.recordupdates(
                     repo, mresult.actionsdict, branchmerge, getfiledata