match: support rooted globs in hgignore
authorValentin Gatien-Baron <vgatien-baron@janestreet.com>
Thu, 03 Jan 2019 19:02:46 -0500
changeset 41282 4fab8a7d2d72
parent 41281 183df3df6031
child 41283 4948b327d3b9
match: support rooted globs in hgignore In a .hgignore, "glob:foo" always means "**/foo". This cannot be avoided because there is no syntax like "^" in regexes to say you don't want the implied "**/" (of course one can use regexes, but glob syntax is nice). When you have a long list of fairly specific globs like path/to/some/thing, this has two consequences: 1. unintended files may be ignored (not too common though) 2. matching performance can suffer significantly Here is vanilla hg status timing on a private repository: Using syntax:glob everywhere real 0m2.199s user 0m1.545s sys 0m0.619s When rooting the appropriate globs real 0m1.434s user 0m0.847s sys 0m0.565s (tangentially, none of this shows up in --profile's output. It seems that C code doesn't play well with profiling) The code already supports this but there is no syntax to make use of it, so it seems reasonable to create such syntax. I create a new hgignore syntax "rootglob". Differential Revision: https://phab.mercurial-scm.org/D5493
mercurial/help/hgignore.txt
mercurial/help/patterns.txt
mercurial/match.py
tests/test-hgignore.t
--- a/mercurial/help/hgignore.txt	Wed Nov 07 15:45:09 2018 -0800
+++ b/mercurial/help/hgignore.txt	Thu Jan 03 19:02:46 2019 -0500
@@ -59,14 +59,17 @@
   Regular expression, Python/Perl syntax.
 ``glob``
   Shell-style glob.
+``rootglob``
+  A variant of ``glob`` that is rooted (see below).
 
 The chosen syntax stays in effect when parsing all patterns that
 follow, until another syntax is selected.
 
-Neither glob nor regexp patterns are rooted. A glob-syntax pattern of
-the form ``*.c`` will match a file ending in ``.c`` in any directory,
-and a regexp pattern of the form ``\.c$`` will do the same. To root a
-regexp pattern, start it with ``^``.
+Neither ``glob`` nor regexp patterns are rooted. A glob-syntax
+pattern of the form ``*.c`` will match a file ending in ``.c`` in any
+directory, and a regexp pattern of the form ``\.c$`` will do the
+same. To root a regexp pattern, start it with ``^``. To get the same
+effect with glob-syntax, you have to use ``rootglob``.
 
 Subdirectories can have their own .hgignore settings by adding
 ``subinclude:path/to/subdir/.hgignore`` to the root ``.hgignore``. See
--- a/mercurial/help/patterns.txt	Wed Nov 07 15:45:09 2018 -0800
+++ b/mercurial/help/patterns.txt	Thu Jan 03 19:02:46 2019 -0500
@@ -20,7 +20,9 @@
 
 To use an extended glob, start a name with ``glob:``. Globs are rooted
 at the current directory; a glob such as ``*.c`` will only match files
-in the current directory ending with ``.c``.
+in the current directory ending with ``.c``. ``rootglob:`` can be used
+instead of ``glob:`` for a glob that is rooted at the root of the
+repository.
 
 The supported glob syntax extensions are ``**`` to match any string
 across path separators and ``{a,b}`` to mean "a or b".
@@ -64,6 +66,7 @@
   foo/*.c        any name ending in ".c" in the directory foo
   foo/**.c       any name ending in ".c" in any subdirectory of foo
                  including itself.
+  rootglob:*.c   any name ending in ".c" in the root of the repository
 
 Regexp examples::
 
--- a/mercurial/match.py	Wed Nov 07 15:45:09 2018 -0800
+++ b/mercurial/match.py	Thu Jan 03 19:02:46 2019 -0500
@@ -25,6 +25,7 @@
 )
 
 allpatternkinds = ('re', 'glob', 'path', 'relglob', 'relpath', 'relre',
+                   'rootglob',
                    'listfile', 'listfile0', 'set', 'include', 'subinclude',
                    'rootfilesin')
 cwdrelativepatternkinds = ('relpath', 'glob')
@@ -221,7 +222,7 @@
     for kind, pat in [_patsplit(p, default) for p in patterns]:
         if kind in cwdrelativepatternkinds:
             pat = pathutil.canonpath(root, cwd, pat, auditor)
-        elif kind in ('relglob', 'path', 'rootfilesin'):
+        elif kind in ('relglob', 'path', 'rootfilesin', 'rootglob'):
             pat = util.normpath(pat)
         elif kind in ('listfile', 'listfile0'):
             try:
@@ -1137,7 +1138,7 @@
         if pat.startswith('^'):
             return pat
         return '.*' + pat
-    if kind == 'glob':
+    if kind in ('glob', 'rootglob'):
         return _globre(pat) + globsuffix
     raise error.ProgrammingError('not a regex pattern: %s:%s' % (kind, pat))
 
@@ -1252,7 +1253,7 @@
     r = []
     d = []
     for kind, pat, source in kindpats:
-        if kind == 'glob': # find the non-glob prefix
+        if kind in ('glob', 'rootglob'): # find the non-glob prefix
             root = []
             for p in pat.split('/'):
                 if '[' in p or '{' in p or '*' in p or '?' in p:
@@ -1351,6 +1352,7 @@
     syntax: glob   # defaults following lines to non-rooted globs
     re:pattern     # non-rooted regular expression
     glob:pattern   # non-rooted glob
+    rootglob:pat   # rooted glob (same root as ^ in regexps)
     pattern        # pattern of the current default type
 
     if sourceinfo is set, returns a list of tuples:
@@ -1361,6 +1363,7 @@
         're': 'relre:',
         'regexp': 'relre:',
         'glob': 'relglob:',
+        'rootglob': 'rootglob:',
         'include': 'include',
         'subinclude': 'subinclude',
     }
--- a/tests/test-hgignore.t	Wed Nov 07 15:45:09 2018 -0800
+++ b/tests/test-hgignore.t	Thu Jan 03 19:02:46 2019 -0500
@@ -239,6 +239,17 @@
   dir/c.o is ignored
   (ignore rule in $TESTTMP/ignorerepo/.hgignore, line 2: 'dir/**/c.o') (glob)
 
+Check rooted globs
+
+  $ hg purge --all --config extensions.purge=
+  $ echo "syntax: rootglob" > .hgignore
+  $ echo "a/*.ext" >> .hgignore
+  $ for p in a b/a aa; do mkdir -p $p; touch $p/b.ext; done
+  $ hg status -A 'set:**.ext'
+  ? aa/b.ext
+  ? b/a/b.ext
+  I a/b.ext
+
 Check using 'include:' in ignore file
 
   $ hg purge --all --config extensions.purge=
@@ -257,10 +268,15 @@
 Check recursive uses of 'include:'
 
   $ echo "include:nested/ignore" >> otherignore
-  $ mkdir nested
+  $ mkdir nested nested/more
   $ echo "glob:*ignore" > nested/ignore
+  $ echo "rootglob:a" >> nested/ignore
+  $ touch a nested/a nested/more/a
   $ hg status
   A dir/b.o
+  ? nested/a
+  ? nested/more/a
+  $ rm a nested/a nested/more/a
 
   $ cp otherignore goodignore
   $ echo "include:badignore" >> otherignore
@@ -291,18 +307,26 @@
   ? dir1/file2
   ? dir2/file1
 
-Check including subincludes with regexs
+Check including subincludes with other patterns
 
   $ echo "subinclude:dir1/.hgignore" >> .hgignore
+
+  $ mkdir dir1/subdir
+  $ touch dir1/subdir/file1
+  $ echo "rootglob:f?le1" > dir1/.hgignore
+  $ hg status
+  ? dir1/file2
+  ? dir1/subdir/file1
+  ? dir2/file1
+  $ rm dir1/subdir/file1
+
   $ echo "regexp:f.le1" > dir1/.hgignore
-
   $ hg status
   ? dir1/file2
   ? dir2/file1
 
 Check multiple levels of sub-ignores
 
-  $ mkdir dir1/subdir
   $ touch dir1/subdir/subfile1 dir1/subdir/subfile3 dir1/subdir/subfile4
   $ echo "subinclude:subdir/.hgignore" >> dir1/.hgignore
   $ echo "glob:subfil*3" >> dir1/subdir/.hgignore