convert: convert URLs to UTF-8 for Subversion stable
authorManuel Jacob <me@manueljacob.de>
Tue, 30 Jun 2020 05:30:47 +0200
branchstable
changeset 45023 e54c3cafda15
parent 45022 e3b19004087a
child 45024 6597e2a73a28
convert: convert URLs to UTF-8 for Subversion Preamble: for comprehension, note that the `path` of geturl() would better be called `path_or_url` (the argument of the call of getsvn() is called `url`). For HTTP(S) URLs, the changes don’t make a difference, as they are restricted to ASCII. For file URLs, the reasoning is the same as for paths: we have to roundtrip with what Subversion is doing. When the locale encoding is ISO-8859-15, trying to convert a SVN repo `file:///tmp/a€` failed before like this: file:///tmp/a%A4 does not look like a Subversion repository to libsvn version 1.14.0 Decoding the path using the locale encoding can fail. In this case, we have to bail out, as Subversion won’t be able to do anything useful with the path.
hgext/convert/subversion.py
tests/test-convert-svn-encoding.t
--- a/hgext/convert/subversion.py	Mon Jun 29 15:03:36 2020 +0200
+++ b/hgext/convert/subversion.py	Tue Jun 30 05:30:47 2020 +0200
@@ -65,10 +65,10 @@
     svn = None
 
 
-# In Subversion, paths are Unicode (encoded as UTF-8), which Subversion
-# converts from / to native strings when interfacing with the OS. When passing
-# paths to Subversion, we have to recode them such that it roundstrips with
-# what Subversion is doing.
+# In Subversion, paths and URLs are Unicode (encoded as UTF-8), which
+# Subversion converts from / to native strings when interfacing with the OS.
+# When passing paths and URLs to Subversion, we have to recode them such that
+# it roundstrips with what Subversion is doing.
 
 fsencoding = None
 
@@ -141,7 +141,9 @@
 
 def geturl(path):
     try:
-        return svn.client.url_from_path(svn.core.svn_path_canonicalize(path))
+        return svn.client.url_from_path(
+            svn.core.svn_path_canonicalize(fs2svn(path))
+        )
     except svn.core.SubversionException:
         # svn.client.url_from_path() fails with local repositories
         pass
@@ -358,6 +360,19 @@
                 and path[2:6].lower() == b'%3a/'
             ):
                 path = path[:2] + b':/' + path[6:]
+            try:
+                path.decode(fsencoding)
+            except UnicodeDecodeError:
+                ui.warn(
+                    _(
+                        b'Subversion requires that file URLs can be converted '
+                        b'to Unicode using the current locale encoding (%s)\n'
+                    )
+                    % pycompat.sysbytes(fsencoding)
+                )
+                return False
+            # FIXME: The following reasoning and logic is wrong and will be
+            # fixed in a following changeset.
             # pycompat.fsdecode() / pycompat.fsencode() are used so that bytes
             # in the URL roundtrip correctly on Unix. urlreq.url2pathname() on
             # py3 will decode percent-encoded bytes using the utf-8 encoding
--- a/tests/test-convert-svn-encoding.t	Mon Jun 29 15:03:36 2020 +0200
+++ b/tests/test-convert-svn-encoding.t	Tue Jun 30 05:30:47 2020 +0200
@@ -182,6 +182,20 @@
   cannot find required "p4" tool
   abort: \xff: missing or unsupported repository (glob) (esc)
   [255]
+  $ hg convert file://$TESTTMP/$XFF test
+  initializing destination test repository
+  Subversion requires that file URLs can be converted to Unicode using the current locale encoding (ascii)
+  file:/*/$TESTTMP/\xff does not look like a CVS checkout (glob) (esc)
+  $TESTTMP/file:$TESTTMP/\xff does not look like a Git repository (esc)
+  file:/*/$TESTTMP/\xff does not look like a Subversion repository (glob) (esc)
+  file:/*/$TESTTMP/\xff is not a local Mercurial repository (glob) (esc)
+  file:/*/$TESTTMP/\xff does not look like a darcs repository (glob) (esc)
+  file:/*/$TESTTMP/\xff does not look like a monotone repository (glob) (esc)
+  file:/*/$TESTTMP/\xff does not look like a GNU Arch repository (glob) (esc)
+  file:/*/$TESTTMP/\xff does not look like a Bazaar repository (glob) (esc)
+  file:/*/$TESTTMP/\xff does not look like a P4 repository (glob) (esc)
+  abort: file:/*/$TESTTMP/\xff: missing or unsupported repository (glob) (esc)
+  [255]
 
 #if py3
 For now, on Python 3, we abort when encountering non-UTF-8 percent-encoded