contrib/hgfixes/fix_bytesmod.py
author Gregory Szorc <gregory.szorc@gmail.com>
Thu, 03 Dec 2015 21:37:01 -0800
changeset 27220 4374d819ccd5
parent 20701 d20817ac628a
permissions -rw-r--r--
mercurial: implement import hook for handling C/Python modules There are a handful of modules that have both pure Python and C extension implementations. Currently, setup.py copies files from mercurial/pure/*.py to mercurial/ during the install process if C extensions are not available. This way, "import mercurial.X" will work whether C extensions are available or not. This approach has a few drawbacks. First, there aren't run-time checks verifying the C extensions are loaded when they should be. This could lead to accidental use of the slower pure Python modules. Second, the C extensions aren't compatible with PyPy and running Mercurial with PyPy requires installing Mercurial - you can't run ./hg from a source checkout. This makes developing while running PyPy somewhat difficult. This patch implements a PEP-302 import hook for finding and loading the modules with both C and Python implementations. When a module with dual implementations is requested for import, its import is handled by our import hook. The importer has a mechanism that controls what types of modules we allow to load. We call this loading behavior the "module load policy." There are 3 settings: * Only load C extensions * Only load pure Python * Try to load C and fall back to Python An environment variable allows overriding this policy at run time. This is mainly useful for developers and for performing actions against the source checkout (such as installing), which require overriding the default (strict) policy about requiring C extensions. The default mode for now is to allow both. This isn't proper and is technically backwards incompatible. However, it is necessary to implement a sane patch series that doesn't break the world during future bisections. The behavior will be corrected in future patch. We choose the main mercurial/__init__.py module for this code out of necessity: in a future world, if the custom module importer isn't registered, we'll fail to find/import certain modules when running from a pure installation. Without the magical import-time side-effects, *any* importer of mercurial.* modules would be required to call a function to register our importer. I'm not a fan of import time side effects and I initially attempted to do this. However, I was foiled by our own test harness, which has numerous `python` invoked scripts that "import mercurial" and fail because the importer isn't registered. Realizing this problem is probably present in random Python scripts that have been written over the years, I decided that sacrificing purity for backwards compatibility is necessary. Plus, if you are programming Python, "import" should probably "just work." It's worth noting that now that we have a custom module loader, it would be possible to hook up demand module proxies at this level instead of replacing __import__. We leave this work for another time, if it's even desired. This patch breaks importing in environments where Mercurial modules are loaded from a zip file (such as py2exe distributions). This will be addressed in a subsequent patch.

"""Fixer that changes bytes % whatever to a function that actually formats
it."""

from lib2to3 import fixer_base
from lib2to3.fixer_util import is_tuple, Call, Comma, Name, touch_import

# XXX: Implementing a blacklist in 2to3 turned out to be more troublesome than
# blacklisting some modules inside the fixers. So, this is what I came with.

blacklist = ['mercurial/demandimport.py',
             'mercurial/py3kcompat.py',
             'mercurial/i18n.py',
            ]

def isnumberremainder(formatstr, data):
    try:
        if data.value.isdigit():
            return True
    except AttributeError:
        return False

class FixBytesmod(fixer_base.BaseFix):
    # XXX: There's one case (I suppose) I can't handle: when a remainder
    # operation like foo % bar is performed, I can't really know what the
    # contents of foo and bar are. I believe the best approach is to "correct"
    # the to-be-converted code and let bytesformatter handle that case in
    # runtime.
    PATTERN = '''
              term< formatstr=STRING '%' data=STRING > |
              term< formatstr=STRING '%' data=atom > |
              term< formatstr=NAME '%' data=any > |
              term< formatstr=any '%' data=any >
              '''

    def transform(self, node, results):
        for bfn in blacklist:
            if self.filename.endswith(bfn):
                return
        if not self.filename.endswith('mercurial/py3kcompat.py'):
            touch_import('mercurial', 'py3kcompat', node=node)

        formatstr = results['formatstr'].clone()
        data = results['data'].clone()
        formatstr.prefix = '' # remove spaces from start

        if isnumberremainder(formatstr, data):
            return

        # We have two possibilities:
        # 1- An identifier or name is passed, it is going to be a leaf, thus, we
        #    just need to copy its value as an argument to the formatter;
        # 2- A tuple is explicitly passed. In this case, we're gonna explode it
        # to pass to the formatter
        # TODO: Check for normal strings. They don't need to be translated

        if is_tuple(data):
            args = [formatstr, Comma().clone()] + \
                   [c.clone() for c in data.children[:]]
        else:
            args = [formatstr, Comma().clone(), data]

        call = Call(Name('bytesformatter', prefix=' '), args)
        return call