revset: support raw string literals
authorBrodie Rao <brodie@bitheap.org>
Fri, 24 Sep 2010 15:36:53 -0500
changeset 12408 78a97859b90d
parent 12407 5bfab61c2fee
child 12409 0eaf7d32a5d8
revset: support raw string literals This adds support for r'...' and r"..." as string literals. Strings with the "r" prefix will not have their escape characters interpreted. This is especially useful for grep(), where, with regular string literals, \number is interpreted as an octal escape code, and \b is interpreted as the backspace character (\x08).
mercurial/help/revsets.txt
mercurial/revset.py
tests/test-revset.t
--- a/mercurial/help/revsets.txt	Sun Sep 26 13:11:31 2010 -0500
+++ b/mercurial/help/revsets.txt	Fri Sep 24 15:36:53 2010 -0500
@@ -7,8 +7,11 @@
 Identifiers such as branch names must be quoted with single or double
 quotes if they contain characters outside of
 ``[._a-zA-Z0-9\x80-\xff]`` or if they match one of the predefined
-predicates. Special characters can be used in quoted identifiers by
-escaping them, e.g., ``\n`` is interpreted as a newline.
+predicates.
+
+Special characters can be used in quoted identifiers by escaping them,
+e.g., ``\n`` is interpreted as a newline. To prevent them from being
+interpreted, strings can be prefixed with ``r``, e.g. ``r'...'``.
 
 There is a single prefix operator:
 
@@ -82,7 +85,8 @@
   An alias for ``::.`` (ancestors of the working copy's first parent).
 
 ``grep(regex)``
-  Like ``keyword(string)`` but accepts a regex.
+  Like ``keyword(string)`` but accepts a regex. Use ``grep(r'...')``
+  to ensure special escape characters are handled correctly.
 
 ``head()``
   Changeset is a head.
--- a/mercurial/revset.py	Sun Sep 26 13:11:31 2010 -0500
+++ b/mercurial/revset.py	Fri Sep 24 15:36:53 2010 -0500
@@ -48,7 +48,14 @@
             pos += 1 # skip ahead
         elif c in "():,-|&+!": # handle simple operators
             yield (c, None, pos)
-        elif c in '"\'': # handle quoted strings
+        elif (c in '"\'' or c == 'r' and
+              program[pos:pos + 2] in ("r'", 'r"')): # handle quoted strings
+            if c == 'r':
+                pos += 1
+                c = program[pos]
+                decode = lambda x: x
+            else:
+                decode = lambda x: x.decode('string-escape')
             pos += 1
             s = pos
             while pos < l: # find closing quote
@@ -57,7 +64,7 @@
                     pos += 2
                     continue
                 if d == c:
-                    yield ('string', program[s:pos].decode('string-escape'), s)
+                    yield ('string', decode(program[s:pos]), s)
                     break
                 pos += 1
             else:
--- a/tests/test-revset.t	Sun Sep 26 13:11:31 2010 -0500
+++ b/tests/test-revset.t	Fri Sep 24 15:36:53 2010 -0500
@@ -215,6 +215,14 @@
   ('func', ('symbol', 'grep'), ('string', '('))
   hg: parse error: invalid match pattern: unbalanced parenthesis
   [255]
+  $ try 'grep("\bissue\d+")'
+  ('func', ('symbol', 'grep'), ('string', '\x08issue\\d+'))
+  $ try 'grep(r"\bissue\d+")'
+  ('func', ('symbol', 'grep'), ('string', '\\bissue\\d+'))
+  6
+  $ try 'grep(r"\")'
+  hg: parse error at 7: unterminated string
+  [255]
   $ log 'head()'
   0
   1