lfs: control tracked file selection via a tracked file
authorMatt Harbison <matt_harbison@yahoo.com>
Sun, 14 Jan 2018 18:12:51 -0500
changeset 35665 1ad1e59b405e
parent 35664 3c838bdc57b6
child 35666 2c6ebd0c850e
lfs: control tracked file selection via a tracked file Since the lfs tracking policy can dramatically affect the repository, it makes more sense to have the policy file checked in, than to rely on all developers configuring their .hgrc properly. The inspiration for this is the .hgeol file. The configuration lives under '[track]', so that other things can be added in the future. Eventually, the config option should be limited to `convert` only. If the file can't be parsed for any reason (including unrecognized elements of the minifileset language), the commit will abort until the problem is corrected. This seems more useful than the warning that hgeol emits, and has no effect on reading the data, so there's no compatibility concerns. My initial thought was to read the file and change each "key = value" line into "((key) & (value))", so that each line could be ORed together, and make a single pass at compiling. Unfortunately, that prevents exclusions if there's a catchall rule. Consider what happens to a large *.c file here: [track] **.c = none() ** = size('>1MB') # ((**.c) & (none())) | ((**) & (size('>1MB'))) => anything > 1MB I also thought about having separate [include] and [exclude] sections. But that just seems to open things up to user mistakes. Consider: [include] **.zip = all() **.php = size('>10MB') [exclude] **.zip = all() # Who wins? **.php = none() # Effectively 'all()' (i.e. nothing excluded), or >10MB ? Therefore, it just compiles each key and value separately, and walks until the key matches something. I'm not sure how to enforce just file patterns on LHS without leaking knowledge about the minifileset here. That means this will allow odd looking lines like this: [track] **.c | **.txt = none() But that's also fewer lines to compile, so slightly more efficient? Some things like 'none()' won't work as expected on LHS though, because that won't match, so that line is skipped. For now, these quirks are not mentioned in the documentation. Jun previously expressed concern about efficiency when scaling to large repos, so I tried avoiding 'repo[None]'. (localrepo.commit() gets repo[None] already, but doesn't tie it to the workingcommitctx used here.) Therefore, I looked at the passed context for 'AMR' status. But that doesn't help with the normal case where the policy file is tracked, but clean. That requires looking up p1() to read the file. I don't see any way to get the content of one file without first creating the full parent context.
hgext/lfs/__init__.py
tests/test-lfs.t
--- a/hgext/lfs/__init__.py	Sun Jan 14 01:04:45 2018 -0500
+++ b/hgext/lfs/__init__.py	Sun Jan 14 18:12:51 2018 -0500
@@ -7,6 +7,30 @@
 
 """lfs - large file support (EXPERIMENTAL)
 
+The extension reads its configuration from a versioned ``.hglfs``
+configuration file found in the root of the working directory. The
+``.hglfs`` file uses the same syntax as all other Mercurial
+configuration files. It uses a single section, ``[track]``.
+
+The ``[track]`` section specifies which files are stored as LFS (or
+not). Each line is keyed by a file pattern, with a predicate value.
+The first file pattern match is used, so put more specific patterns
+first.  The available predicates are ``all()``, ``none()``, and
+``size()``. See "hg help filesets.size" for the latter.
+
+Example versioned ``.hglfs`` file::
+
+  [track]
+  # No Makefile or python file, anywhere, will be LFS
+  **Makefile = none()
+  **.py = none()
+
+  **.zip = all()
+  **.exe = size(">1MB")
+
+  # Catchall for everything not matched above
+  ** = size(">10MB")
+
 Configs::
 
     [lfs]
@@ -35,6 +59,9 @@
     # - (**.php & size(">2MB")) | (**.js & size(">5MB")) | **.tar.gz
     #     | ("path:bin" & !"path:/bin/README") | size(">1GB")
     # (default: none())
+    #
+    # This is ignored if there is a tracked '.hglfs' file, and this setting
+    # will eventually be deprecated and removed.
     track = size(">10M")
 
     # how many times to retry before giving up on transferring an object
@@ -53,7 +80,9 @@
     bundle2,
     changegroup,
     cmdutil,
+    config,
     context,
+    error,
     exchange,
     extensions,
     filelog,
@@ -124,13 +153,20 @@
     if not repo.local():
         return
 
-    repo.svfs.options['lfstrack'] = _trackedmatcher(repo)
     repo.svfs.lfslocalblobstore = blobstore.local(repo)
     repo.svfs.lfsremoteblobstore = blobstore.remote(repo)
 
     # Push hook
     repo.prepushoutgoinghooks.add('lfs', wrapper.prepush)
 
+    class lfsrepo(repo.__class__):
+        @localrepo.unfilteredmethod
+        def commitctx(self, ctx, error=False):
+            repo.svfs.options['lfstrack'] = _trackedmatcher(self, ctx)
+            return super(lfsrepo, self).commitctx(ctx, error)
+
+    repo.__class__ = lfsrepo
+
     if 'lfs' not in repo.requirements:
         def checkrequireslfs(ui, repo, **kwargs):
             if 'lfs' not in repo.requirements:
@@ -150,18 +186,58 @@
         ui.setconfig('hooks', 'commit.lfs', checkrequireslfs, 'lfs')
         ui.setconfig('hooks', 'pretxnchangegroup.lfs', checkrequireslfs, 'lfs')
 
-def _trackedmatcher(repo):
+def _trackedmatcher(repo, ctx):
     """Return a function (path, size) -> bool indicating whether or not to
     track a given file with lfs."""
-    trackspec = repo.ui.config('lfs', 'track')
+    data = ''
+
+    if '.hglfs' in ctx.added() or '.hglfs' in ctx.modified():
+        data = ctx['.hglfs'].data()
+    elif '.hglfs' not in ctx.removed():
+        p1 = repo['.']
+
+        if '.hglfs' not in p1:
+            # No '.hglfs' in wdir or in parent.  Fallback to config
+            # for now.
+            trackspec = repo.ui.config('lfs', 'track')
+
+            # deprecated config: lfs.threshold
+            threshold = repo.ui.configbytes('lfs', 'threshold')
+            if threshold:
+                fileset.parse(trackspec)  # make sure syntax errors are confined
+                trackspec = "(%s) | size('>%d')" % (trackspec, threshold)
+
+            return minifileset.compile(trackspec)
+
+        data = p1['.hglfs'].data()
 
-    # deprecated config: lfs.threshold
-    threshold = repo.ui.configbytes('lfs', 'threshold')
-    if threshold:
-        fileset.parse(trackspec)  # make sure syntax errors are confined
-        trackspec = "(%s) | size('>%d')" % (trackspec, threshold)
+    # In removed, or not in parent
+    if not data:
+        return lambda p, s: False
+
+    # Parse errors here will abort with a message that points to the .hglfs file
+    # and line number.
+    cfg = config.config()
+    cfg.parse('.hglfs', data)
 
-    return minifileset.compile(trackspec)
+    try:
+        rules = [(minifileset.compile(pattern), minifileset.compile(rule))
+                 for pattern, rule in cfg.items('track')]
+    except error.ParseError as e:
+        # The original exception gives no indicator that the error is in the
+        # .hglfs file, so add that.
+
+        # TODO: See if the line number of the file can be made available.
+        raise error.Abort(_('parse error in .hglfs: %s') % e)
+
+    def _match(path, size):
+        for pat, rule in rules:
+            if pat(path, size):
+                return rule(path, size)
+
+        return False
+
+    return _match
 
 def wrapfilelog(filelog):
     wrapfunction = extensions.wrapfunction
--- a/tests/test-lfs.t	Sun Jan 14 01:04:45 2018 -0500
+++ b/tests/test-lfs.t	Sun Jan 14 18:12:51 2018 -0500
@@ -937,6 +937,79 @@
   $ hg commit -m 'add A' -A A
   $ hg rm A
   $ hg commit -m 'rm A'
+
+Bad .hglfs files will block the commit with a useful message
+
+  $ cat > .hglfs << EOF
+  > [track]
+  > **.test = size(">5B")
+  > bad file ... no commit
+  > EOF
+
+  $ echo x > file.txt
+  $ hg ci -Aqm 'should fail'
+  hg: parse error at .hglfs:3: bad file ... no commit
+  [255]
+
+  $ cat > .hglfs << EOF
+  > [track]
+  > **.test = size(">5B")
+  > ** = nonexistent()
+  > EOF
+
+  $ hg ci -Aqm 'should fail'
+  abort: parse error in .hglfs: unknown identifier: nonexistent
+  [255]
+
+'**' works out to mean all files.
+
+  $ cat > .hglfs << EOF
+  > [track]
+  > **.test = size(">5B")
+  > **.exclude = none()
+  > ** = size(">10B")
+  > EOF
+
+The LFS policy takes effect as the .hglfs file is committed
+
+  $ echo 'largefile' > lfs.test
+  $ echo '012345678901234567890' > nolfs.exclude
+  $ echo '01234567890123456' > lfs.catchall
+  $ hg ci -Aqm 'added .hglfs'
+  $ hg log -r . -T '{rev}: {lfs_files % "{file}: {oid}\n"}\n'
+  2: lfs.catchall: d4ec46c2869ba22eceb42a729377432052d9dd75d82fc40390ebaadecee87ee9
+  lfs.test: 5489e6ced8c36a7b267292bde9fd5242a5f80a7482e8f23fa0477393dfaa4d6c
+  
+The existing .hglfs file is used even when it is not in the 'A' or 'M' states
+
+  $ echo 'largefile2' > lfs.test
+  $ echo '012345678901234567890a' > nolfs.exclude
+  $ echo '01234567890123456a' > lfs.catchall
+  $ hg ci -qm 'unmodified .hglfs'
+  $ hg log -r . -T '{rev}: {lfs_files % "{file}: {oid}\n"}\n'
+  3: lfs.catchall: 31f43b9c62b540126b0ad5884dc013d21a61c9329b77de1fceeae2fc58511573
+  lfs.test: 8acd23467967bc7b8cc5a280056589b0ba0b17ff21dbd88a7b6474d6290378a6
+  
+Excluding the .hglfs file from the commit postpones the policy change
+
+  $ hg rm .hglfs
+  $ echo 'largefile3' > lfs.test
+  $ echo '012345678901234567890abc' > nolfs.exclude
+  $ echo '01234567890123456abc' > lfs.catchall
+  $ hg ci -qm 'file test' -X .hglfs
+  $ hg log -r . -T '{rev}: {lfs_files % "{file}: {oid}\n"}\n'
+  4: lfs.catchall: 6747cfb1b83965b4a884e7a6061813ae31e4122028bc6a88d2ac5e5f9e05c5af
+  lfs.test: 3f40b70c2294e91e0fa789ebcf73c5a1d1c7aef270f83e477e40cb0513237e8c
+  
+The policy change takes effect when the .hglfs is committed
+
+  $ echo 'largefile4' > lfs.test
+  $ echo '012345678901234567890abcdef' > nolfs.exclude
+  $ echo '01234567890123456abcdef' > lfs.catchall
+  $ hg ci -qm 'file test'
+  $ hg log -r . -T '{rev}: {lfs_files % "{file}: {oid}\n"}\n'
+  5: 
+
   $ cd ..
 
 Unbundling adds a requirement to a non-lfs repo, if necessary.