darwin: omit ignorable codepoints when normcase()ing a file path stable
authorAugie Fackler <raf@durin42.com>
Tue, 16 Dec 2014 13:07:10 -0500
branchstable
changeset 23597 7a5bcd471f2e
parent 23596 885bd7c5c7e3
child 23598 c02a05cc6f5e
darwin: omit ignorable codepoints when normcase()ing a file path This lets us avoid some nasty case collision problems in OS X with invisible codepoints.
mercurial/posix.py
tests/test-casefolding.t
--- a/mercurial/posix.py	Tue Dec 16 13:06:41 2014 -0500
+++ b/mercurial/posix.py	Tue Dec 16 13:07:10 2014 -0500
@@ -208,6 +208,7 @@
         - escape-encode invalid characters
         - decompose to NFD
         - lowercase
+        - omit ignored characters [200c-200f, 202a-202e, 206a-206f,feff]
 
         >>> normcase('UPPER')
         'upper'
@@ -265,7 +266,9 @@
             u = s.decode('utf-8')
 
         # Decompose then lowercase (HFS+ technote specifies lower)
-        return unicodedata.normalize('NFD', u).lower().encode('utf-8')
+        enc = unicodedata.normalize('NFD', u).lower().encode('utf-8')
+        # drop HFS+ ignored characters
+        return encoding.hfsignoreclean(enc)
 
 if sys.platform == 'cygwin':
     # workaround for cygwin, in which mount point part of path is
--- a/tests/test-casefolding.t	Tue Dec 16 13:06:41 2014 -0500
+++ b/tests/test-casefolding.t	Tue Dec 16 13:07:10 2014 -0500
@@ -200,12 +200,11 @@
 We assume anyone running the tests on a case-insensitive volume on OS
 X will be using HFS+. If that's not true, this test will fail.
 
-Bug: some codepoints are to be ignored on HFS+:
-
   $ rm A
   >>> open(u'a\u200c'.encode('utf-8'), 'w').write('unicode is fun')
   $ hg status
   M A
-  ? a\xe2\x80\x8c (esc)
+
 #endif
+
   $ cd ..