revlog: skeleton support for version 2 revlogs
authorGregory Szorc <gregory.szorc@gmail.com>
Fri, 19 May 2017 20:29:11 -0700
changeset 32697 19b9fc40cc51
parent 32696 0c09afdf5704
child 32698 1b5c61d38a52
revlog: skeleton support for version 2 revlogs There are a number of improvements we want to make to revlogs that will require a new version - version 2. It is unclear what the full set of improvements will be or when we'll be done with them. What I do know is that the process will likely take longer than a single release, will require input from various stakeholders to evaluate changes, and will have many contentious debates and bikeshedding. It is unrealistic to develop revlog version 2 up front: there are just too many uncertainties that we won't know until things are implemented and experiments are run. Some changes will also be invasive and prone to bit rot, so sitting on dozens of patches is not practical. This commit introduces skeleton support for version 2 revlogs in a way that is flexible and not bound by backwards compatibility concerns. An experimental repo requirement for denoting revlog v2 has been added. The requirement string has a sub-version component to it. This will allow us to declare multiple requirements in the course of developing revlog v2. Whenever we change the in-development revlog v2 format, we can tweak the string, creating a new requirement and locking out old clients. This will allow us to make as many backwards incompatible changes and experiments to revlog v2 as we want. In other words, we can land code and make meaningful progress towards revlog v2 while still maintaining extreme format flexibility up until the point we freeze the format and remove the experimental labels. To enable the new repo requirement, you must supply an experimental and undocumented config option. But not just any boolean flag will do: you need to explicitly use a value that no sane person should ever type. This is an additional guard against enabling revlog v2 on an installation it shouldn't be enabled on. The specific scenario I'm trying to prevent is say a user with a 4.4 client with a frozen format enabling the option but then downgrading to 4.3 and accidentally creating repos with an outdated and unsupported repo format. Requiring a "challenge" string should prevent this. Because the format is not yet finalized and I don't want to take any chances, revlog v2's version is currently 0xDEAD. I figure squatting on a value we're likely never to use as an actual revlog version to mean "internal testing only" is acceptable. And "dead" is easily recognized as something meaningful. There is a bunch of cleanup that is needed before work on revlog v2 begins in earnest. I plan on doing that work once this patch is accepted and we're comfortable with the idea of starting down this path.
mercurial/help/internals/revlogs.txt
mercurial/localrepo.py
mercurial/revlog.py
tests/test-revlog-v2.t
--- a/mercurial/help/internals/revlogs.txt	Tue Jun 06 08:58:27 2017 -0700
+++ b/mercurial/help/internals/revlogs.txt	Fri May 19 20:29:11 2017 -0700
@@ -45,6 +45,12 @@
 1
    RevlogNG (*next generation*). It replaced version 0 when it was
    implemented in 2006.
+2
+   In-development version incorporating accumulated knowledge and
+   missing features from 10+ years of revlog version 1.
+57005 (0xdead)
+   Reserved for internal testing of new versions. No defined format
+   beyond 32-bit header.
 
 The feature flags short consists of bit flags. Where 0 is the least
 significant bit, the following bit offsets define flags:
@@ -142,6 +148,14 @@
 The first 4 bytes of the revlog are shared between the revlog header
 and the 6 byte absolute offset field from the first revlog entry.
 
+Version 2 Format
+================
+
+(In development. Format not finalized or stable.)
+
+Version 2 is currently identical to version 1. This will obviously
+change.
+
 Delta Chains
 ============
 
--- a/mercurial/localrepo.py	Tue Jun 06 08:58:27 2017 -0700
+++ b/mercurial/localrepo.py	Fri May 19 20:29:11 2017 -0700
@@ -244,6 +244,10 @@
     def changegroupsubset(self, bases, heads, source):
         return changegroup.changegroupsubset(self._repo, bases, heads, source)
 
+# Increment the sub-version when the revlog v2 format changes to lock out old
+# clients.
+REVLOGV2_REQUIREMENT = 'exp-revlogv2.0'
+
 class localrepository(object):
 
     supportedformats = {
@@ -251,6 +255,7 @@
         'generaldelta',
         'treemanifest',
         'manifestv2',
+        REVLOGV2_REQUIREMENT,
     }
     _basesupported = supportedformats | {
         'store',
@@ -440,6 +445,10 @@
             if r.startswith('exp-compression-'):
                 self.svfs.options['compengine'] = r[len('exp-compression-'):]
 
+        # TODO move "revlogv2" to openerreqs once finalized.
+        if REVLOGV2_REQUIREMENT in self.requirements:
+            self.svfs.options['revlogv2'] = True
+
     def _writerequirements(self):
         scmutil.writerequires(self.vfs, self.requirements)
 
@@ -2070,4 +2079,11 @@
     if ui.configbool('experimental', 'manifestv2', False):
         requirements.add('manifestv2')
 
+    revlogv2 = ui.config('experimental', 'revlogv2')
+    if revlogv2 == 'enable-unstable-format-and-corrupt-my-data':
+        requirements.remove('revlogv1')
+        # generaldelta is implied by revlogv2.
+        requirements.discard('generaldelta')
+        requirements.add(REVLOGV2_REQUIREMENT)
+
     return requirements
--- a/mercurial/revlog.py	Tue Jun 06 08:58:27 2017 -0700
+++ b/mercurial/revlog.py	Fri May 19 20:29:11 2017 -0700
@@ -51,12 +51,16 @@
 # revlog header flags
 REVLOGV0 = 0
 REVLOGV1 = 1
+# Dummy value until file format is finalized.
+# Reminder: change the bounds check in revlog.__init__ when this is changed.
+REVLOGV2 = 0xDEAD
 FLAG_INLINE_DATA = (1 << 16)
 FLAG_GENERALDELTA = (1 << 17)
 REVLOG_DEFAULT_FLAGS = FLAG_INLINE_DATA
 REVLOG_DEFAULT_FORMAT = REVLOGV1
 REVLOG_DEFAULT_VERSION = REVLOG_DEFAULT_FORMAT | REVLOG_DEFAULT_FLAGS
 REVLOGV1_FLAGS = FLAG_INLINE_DATA | FLAG_GENERALDELTA
+REVLOGV2_FLAGS = REVLOGV1_FLAGS
 
 # revlog index flags
 REVIDX_ISCENSORED = (1 << 15) # revision has censor metadata, must be verified
@@ -291,7 +295,10 @@
         v = REVLOG_DEFAULT_VERSION
         opts = getattr(opener, 'options', None)
         if opts is not None:
-            if 'revlogv1' in opts:
+            if 'revlogv2' in opts:
+                # version 2 revlogs always use generaldelta.
+                v = REVLOGV2 | FLAG_GENERALDELTA | FLAG_INLINE_DATA
+            elif 'revlogv1' in opts:
                 if 'generaldelta' in opts:
                     v |= FLAG_GENERALDELTA
             else:
@@ -341,6 +348,11 @@
                 raise RevlogError(_('unknown flags (%#04x) in version %d '
                                     'revlog %s') %
                                   (flags >> 16, fmt, self.indexfile))
+        elif fmt == REVLOGV2:
+            if flags & ~REVLOGV2_FLAGS:
+                raise RevlogError(_('unknown flags (%#04x) in version %d '
+                                    'revlog %s') %
+                                  (flags >> 16, fmt, self.indexfile))
         else:
             raise RevlogError(_('unknown version (%d) in revlog %s') %
                               (fmt, self.indexfile))
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tests/test-revlog-v2.t	Fri May 19 20:29:11 2017 -0700
@@ -0,0 +1,62 @@
+A repo with unknown revlogv2 requirement string cannot be opened
+
+  $ hg init invalidreq
+  $ cd invalidreq
+  $ echo exp-revlogv2.unknown >> .hg/requires
+  $ hg log
+  abort: repository requires features unknown to this Mercurial: exp-revlogv2.unknown!
+  (see https://mercurial-scm.org/wiki/MissingRequirement for more information)
+  [255]
+  $ cd ..
+
+Can create and open repo with revlog v2 requirement
+
+  $ cat >> $HGRCPATH << EOF
+  > [experimental]
+  > revlogv2 = enable-unstable-format-and-corrupt-my-data
+  > EOF
+
+  $ hg init empty-repo
+  $ cd empty-repo
+  $ cat .hg/requires
+  dotencode
+  exp-revlogv2.0
+  fncache
+  store
+
+  $ hg log
+
+Unknown flags to revlog are rejected
+
+  >>> with open('.hg/store/00changelog.i', 'wb') as fh:
+  ...     fh.write('\x00\x04\xde\xad')
+
+  $ hg log
+  abort: unknown flags (0x04) in version 57005 revlog 00changelog.i!
+  [255]
+
+  $ cd ..
+
+Writing a simple revlog v2 works
+
+  $ hg init simple
+  $ cd simple
+  $ touch foo
+  $ hg -q commit -A -m initial
+
+  $ hg log
+  changeset:   0:96ee1d7354c4
+  tag:         tip
+  user:        test
+  date:        Thu Jan 01 00:00:00 1970 +0000
+  summary:     initial
+  
+Header written as expected (changelog always disables generaldelta)
+
+  $ f --hexdump --bytes 4 .hg/store/00changelog.i
+  .hg/store/00changelog.i:
+  0000: 00 01 de ad                                     |....|
+
+  $ f --hexdump --bytes 4 .hg/store/data/foo.i
+  .hg/store/data/foo.i:
+  0000: 00 03 de ad                                     |....|