changelog: lazily parse date/extra field
authorGregory Szorc <gregory.szorc@gmail.com>
Sun, 06 Mar 2016 14:30:25 -0800
changeset 28492 837f1c437d58
parent 28491 f57f7500a095
child 28493 7796473c11b3
changelog: lazily parse date/extra field This is probably the most complicated patch in the parsing refactor. Because the date and extras are encoded in the same field, we stuff the entire field into a dedicated variable and add a property for accessing the sub-components of each. There is some duplicated code here. But the code is relatively simple, so it shouldn't be a big deal. We see revset performance wins across the board: author(mpm) 0.896565 0.876713 0.822961 desc(bug) 0.887169 0.895514 0.847054 date(2015) 0.878797 0.820987 0.811613 extra(rebase_source) 0.865446 0.823811 0.797756 author(mpm) or author(greg) 1.801832 1.784160 1.668172 author(mpm) or desc(bug) 1.812438 1.822756 1.677608 date(2015) or branch(default) 0.968276 0.910981 0.896032 author(mpm) or desc(bug) or date(2015) or extra(rebase_source) 3.656193 3.516788 3.265024 We see a speed-up on revsets accessing date and extras because the new parsing code only parses what you access. Even though they are stored the same text field, we avoid parsing dates when accessing extras and vice-versa. But strangely revsets accessing both date and extras appeared to speed up as well! I'm not sure if this is due to refactoring the parsing code or due to an optimization in revsets. You can't argue with the results!
mercurial/changelog.py
--- a/mercurial/changelog.py	Sun Mar 06 14:29:46 2016 -0800
+++ b/mercurial/changelog.py	Sun Mar 06 14:30:25 2016 -0800
@@ -151,9 +151,8 @@
     """
 
     __slots__ = (
-        'date',
+        '_rawdateextra',
         '_rawdesc',
-        'extra',
         'files',
         '_rawmanifest',
         '_rawuser',
@@ -194,22 +193,10 @@
         nl2 = text.index('\n', nl1 + 1)
         self._rawuser = text[nl1 + 1:nl2]
 
-        l = text[:doublenl].split('\n')
+        nl3 = text.index('\n', nl2 + 1)
+        self._rawdateextra = text[nl2 + 1:nl3]
 
-        tdata = l[2].split(' ', 2)
-        if len(tdata) != 3:
-            time = float(tdata[0])
-            try:
-                # various tools did silly things with the time zone field.
-                timezone = int(tdata[1])
-            except ValueError:
-                timezone = 0
-            self.extra = _defaultextra
-        else:
-            time, timezone = float(tdata[0]), int(tdata[1])
-            self.extra = decodeextra(tdata[2])
-
-        self.date = (time, timezone)
+        l = text[:doublenl].split('\n')
         self.files = l[3:]
 
         return self
@@ -223,6 +210,38 @@
         return encoding.tolocal(self._rawuser)
 
     @property
+    def _rawdate(self):
+        return self._rawdateextra.split(' ', 2)[0:2]
+
+    @property
+    def _rawextra(self):
+        fields = self._rawdateextra.split(' ', 2)
+        if len(fields) != 3:
+            return None
+
+        return fields[2]
+
+    @property
+    def date(self):
+        raw = self._rawdate
+        time = float(raw[0])
+        # Various tools did silly things with the timezone.
+        try:
+            timezone = int(raw[1])
+        except ValueError:
+            timezone = 0
+
+        return time, timezone
+
+    @property
+    def extra(self):
+        raw = self._rawextra
+        if raw is None:
+            return _defaultextra
+
+        return decodeextra(raw)
+
+    @property
     def description(self):
         return encoding.tolocal(self._rawdesc)