hgext/win32mbcs.py
author Shun-ichi GOTO <shunichi.goto@gmail.com>
Thu, 09 Jul 2009 22:06:30 +0900
changeset 9100 623f96ae3a26
parent 9098 5e4654f5522d
child 9102 bbc78cb1bf15
child 9131 2bbb8419720d
permissions -rw-r--r--
win32mbcs: also wrap windows.pconvert()
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
     1
# win32mbcs.py -- MBCS filename support for Mercurial
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
     2
#
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
     3
# Copyright (c) 2008 Shun-ichi Goto <shunichi.goto@gmail.com>
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
     4
#
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
     5
# Version: 0.2
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
     6
# Author:  Shun-ichi Goto <shunichi.goto@gmail.com>
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
     7
#
8225
46293a0c7e9f updated license to be explicit about GPL version 2
Martin Geisler <mg@lazybytes.net>
parents: 8001
diff changeset
     8
# This software may be used and distributed according to the terms of the
46293a0c7e9f updated license to be explicit about GPL version 2
Martin Geisler <mg@lazybytes.net>
parents: 8001
diff changeset
     9
# GNU General Public License version 2, incorporated herein by reference.
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
    10
#
8228
eee2319c5895 add blank line after copyright notices and after header
Martin Geisler <mg@lazybytes.net>
parents: 8225
diff changeset
    11
8932
f87884329419 extensions: fix up description lines some more
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 8894
diff changeset
    12
'''allow the use of MBCS paths with problematic encodings
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
    13
8001
c0e3aca616de win32mbcs: word-wrap help texts at 70 characters
Martin Geisler <mg@daimi.au.dk>
parents: 7983
diff changeset
    14
Some MBCS encodings are not good for some path operations (i.e.
c0e3aca616de win32mbcs: word-wrap help texts at 70 characters
Martin Geisler <mg@daimi.au.dk>
parents: 7983
diff changeset
    15
splitting path, case conversion, etc.) with its encoded bytes. We call
c0e3aca616de win32mbcs: word-wrap help texts at 70 characters
Martin Geisler <mg@daimi.au.dk>
parents: 7983
diff changeset
    16
such a encoding (i.e. shift_jis and big5) as "problematic encoding".
c0e3aca616de win32mbcs: word-wrap help texts at 70 characters
Martin Geisler <mg@daimi.au.dk>
parents: 7983
diff changeset
    17
This extension can be used to fix the issue with those encodings by
8665
e4ad46f9a004 win32mbcs: capitalize Unicode
Martin Geisler <mg@lazybytes.net>
parents: 8491
diff changeset
    18
wrapping some functions to convert to Unicode string before path
8001
c0e3aca616de win32mbcs: word-wrap help texts at 70 characters
Martin Geisler <mg@daimi.au.dk>
parents: 7983
diff changeset
    19
operation.
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
    20
8668
aea3a23151bd fixed typos found in translatable strings
Martin Geisler <mg@lazybytes.net>
parents: 8667
diff changeset
    21
This extension is useful for:
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    22
 * Japanese Windows users using shift_jis encoding.
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    23
 * Chinese Windows users using big5 encoding.
8001
c0e3aca616de win32mbcs: word-wrap help texts at 70 characters
Martin Geisler <mg@daimi.au.dk>
parents: 7983
diff changeset
    24
 * All users who use a repository with one of problematic encodings on
c0e3aca616de win32mbcs: word-wrap help texts at 70 characters
Martin Geisler <mg@daimi.au.dk>
parents: 7983
diff changeset
    25
   case-insensitive file system.
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    26
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    27
This extension is not needed for:
8667
594507755800 graphlog, win32mbcs: capitalize ASCII
Martin Geisler <mg@lazybytes.net>
parents: 8665
diff changeset
    28
 * Any user who use only ASCII chars in path.
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    29
 * Any user who do not use any of problematic encodings.
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
    30
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    31
Note that there are some limitations on using this extension:
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    32
 * You should use single encoding in one repository.
8001
c0e3aca616de win32mbcs: word-wrap help texts at 70 characters
Martin Geisler <mg@daimi.au.dk>
parents: 7983
diff changeset
    33
 * You should set same encoding for the repository by locale or
c0e3aca616de win32mbcs: word-wrap help texts at 70 characters
Martin Geisler <mg@daimi.au.dk>
parents: 7983
diff changeset
    34
   HGENCODING.
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    35
8665
e4ad46f9a004 win32mbcs: capitalize Unicode
Martin Geisler <mg@lazybytes.net>
parents: 8491
diff changeset
    36
Path encoding conversion are done between Unicode and
8760
bf17aeafb869 Spell Mercurial as a proper noun
timeless <timeless@gmail.com>
parents: 8714
diff changeset
    37
encoding.encoding which is decided by Mercurial from current locale
8001
c0e3aca616de win32mbcs: word-wrap help texts at 70 characters
Martin Geisler <mg@daimi.au.dk>
parents: 7983
diff changeset
    38
setting or HGENCODING.
8894
868670dbc237 extensions: improve the consistency of synopses
Cédric Duval <cedricduval@free.fr>
parents: 8866
diff changeset
    39
'''
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
    40
9098
5e4654f5522d win32mbcs: look up modules using sys.modules (issue1729)
Brodie Rao <me+hg@dackz.net>
parents: 8932
diff changeset
    41
import os, sys
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
    42
from mercurial.i18n import _
7948
de377b1a9a84 move encoding bits from util to encoding
Matt Mackall <mpm@selenic.com>
parents: 7877
diff changeset
    43
from mercurial import util, encoding
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
    44
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    45
def decode(arg):
7877
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    46
    if isinstance(arg, str):
7948
de377b1a9a84 move encoding bits from util to encoding
Matt Mackall <mpm@selenic.com>
parents: 7877
diff changeset
    47
        uarg = arg.decode(encoding.encoding)
de377b1a9a84 move encoding bits from util to encoding
Matt Mackall <mpm@selenic.com>
parents: 7877
diff changeset
    48
        if arg == uarg.encode(encoding.encoding):
7877
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    49
            return uarg
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    50
        raise UnicodeError("Not local encoding")
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    51
    elif isinstance(arg, tuple):
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    52
        return tuple(map(decode, arg))
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    53
    elif isinstance(arg, list):
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    54
        return map(decode, arg)
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    55
    return arg
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    56
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    57
def encode(arg):
7877
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    58
    if isinstance(arg, unicode):
7948
de377b1a9a84 move encoding bits from util to encoding
Matt Mackall <mpm@selenic.com>
parents: 7877
diff changeset
    59
        return arg.encode(encoding.encoding)
7877
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    60
    elif isinstance(arg, tuple):
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    61
        return tuple(map(encode, arg))
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    62
    elif isinstance(arg, list):
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    63
        return map(encode, arg)
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    64
    return arg
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    65
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    66
def wrapper(func, args):
7877
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    67
    # check argument is unicode, then call original
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    68
    for arg in args:
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    69
        if isinstance(arg, unicode):
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    70
            return func(*args)
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
    71
7877
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    72
    try:
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    73
        # convert arguments to unicode, call func, then convert back
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    74
        return encode(func(*decode(args)))
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    75
    except UnicodeError:
7948
de377b1a9a84 move encoding bits from util to encoding
Matt Mackall <mpm@selenic.com>
parents: 7877
diff changeset
    76
        # If not encoded with encoding.encoding, report it then
7877
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    77
        # continue with calling original function.
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    78
        raise util.Abort(_("[win32mbcs] filename conversion fail with"
7948
de377b1a9a84 move encoding bits from util to encoding
Matt Mackall <mpm@selenic.com>
parents: 7877
diff changeset
    79
                         " %s encoding\n") % (encoding.encoding))
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    80
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    81
def wrapname(name):
9098
5e4654f5522d win32mbcs: look up modules using sys.modules (issue1729)
Brodie Rao <me+hg@dackz.net>
parents: 8932
diff changeset
    82
    module, name = name.rsplit('.', 1)
5e4654f5522d win32mbcs: look up modules using sys.modules (issue1729)
Brodie Rao <me+hg@dackz.net>
parents: 8932
diff changeset
    83
    module = sys.modules[module]
7877
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    84
    func = getattr(module, name)
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    85
    def f(*args):
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    86
        return wrapper(func, args)
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    87
    try:
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    88
        f.__name__ = func.__name__                # fail with python23
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    89
    except Exception:
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    90
        pass
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
    91
    setattr(module, name, f)
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    92
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    93
# List of functions to be wrapped.
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    94
# NOTE: os.path.dirname() and os.path.basename() are safe because
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    95
#       they use result of os.path.split()
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    96
funcs = '''os.path.join os.path.split os.path.splitext
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
    97
 os.path.splitunc os.path.normpath os.path.normcase os.makedirs
9098
5e4654f5522d win32mbcs: look up modules using sys.modules (issue1729)
Brodie Rao <me+hg@dackz.net>
parents: 8932
diff changeset
    98
 mercurial.util.endswithsep mercurial.util.splitpath mercurial.util.checkcase
9100
623f96ae3a26 win32mbcs: also wrap windows.pconvert()
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents: 9098
diff changeset
    99
 mercurial.util.fspath mercurial.windows.pconvert'''
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
   100
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
   101
# codec and alias names of sjis and big5 to be faked.
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
   102
problematic_encodings = '''big5 big5-tw csbig5 big5hkscs big5-hkscs
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
   103
 hkscs cp932 932 ms932 mskanji ms-kanji shift_jis csshiftjis shiftjis
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
   104
 sjis s_jis shift_jis_2004 shiftjis2004 sjis_2004 sjis2004
8714
505a96cbc923 Add cp950 as problematic encoding which is used in chinese windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents: 8668
diff changeset
   105
 shift_jisx0213 shiftjisx0213 sjisx0213 s_jisx0213 950 cp950 ms950 '''
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
   106
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
   107
def reposetup(ui, repo):
7877
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
   108
    # TODO: decide use of config section for this extension
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
   109
    if not os.path.supports_unicode_filenames:
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
   110
        ui.warn(_("[win32mbcs] cannot activate on this platform.\n"))
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
   111
        return
5846
02884e56c217 New extension to support problematic MBCS on Windows.
Shun-ichi GOTO <shunichi.goto@gmail.com>
parents:
diff changeset
   112
7877
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
   113
    # fake is only for relevant environment.
7948
de377b1a9a84 move encoding bits from util to encoding
Matt Mackall <mpm@selenic.com>
parents: 7877
diff changeset
   114
    if encoding.encoding.lower() in problematic_encodings.split():
7877
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
   115
        for f in funcs.split():
eba7f12b0c51 cleanup: whitespace cleanup
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 7598
diff changeset
   116
            wrapname(f)
7948
de377b1a9a84 move encoding bits from util to encoding
Matt Mackall <mpm@selenic.com>
parents: 7877
diff changeset
   117
        ui.debug(_("[win32mbcs] activated with encoding: %s\n")
de377b1a9a84 move encoding bits from util to encoding
Matt Mackall <mpm@selenic.com>
parents: 7877
diff changeset
   118
                 % encoding.encoding)
6887
304484c7e0ba Update win32mbcs extension
Shun-ichi Goto <shunichi.goto@gmail.com>
parents: 6210
diff changeset
   119