contrib/hgfixes/fix_leftover_imports.py
author Gregory Szorc <gregory.szorc@gmail.com>
Thu, 03 Dec 2015 21:37:01 -0800
changeset 27220 4374d819ccd5
parent 19378 9de689d20230
permissions -rw-r--r--
mercurial: implement import hook for handling C/Python modules There are a handful of modules that have both pure Python and C extension implementations. Currently, setup.py copies files from mercurial/pure/*.py to mercurial/ during the install process if C extensions are not available. This way, "import mercurial.X" will work whether C extensions are available or not. This approach has a few drawbacks. First, there aren't run-time checks verifying the C extensions are loaded when they should be. This could lead to accidental use of the slower pure Python modules. Second, the C extensions aren't compatible with PyPy and running Mercurial with PyPy requires installing Mercurial - you can't run ./hg from a source checkout. This makes developing while running PyPy somewhat difficult. This patch implements a PEP-302 import hook for finding and loading the modules with both C and Python implementations. When a module with dual implementations is requested for import, its import is handled by our import hook. The importer has a mechanism that controls what types of modules we allow to load. We call this loading behavior the "module load policy." There are 3 settings: * Only load C extensions * Only load pure Python * Try to load C and fall back to Python An environment variable allows overriding this policy at run time. This is mainly useful for developers and for performing actions against the source checkout (such as installing), which require overriding the default (strict) policy about requiring C extensions. The default mode for now is to allow both. This isn't proper and is technically backwards incompatible. However, it is necessary to implement a sane patch series that doesn't break the world during future bisections. The behavior will be corrected in future patch. We choose the main mercurial/__init__.py module for this code out of necessity: in a future world, if the custom module importer isn't registered, we'll fail to find/import certain modules when running from a pure installation. Without the magical import-time side-effects, *any* importer of mercurial.* modules would be required to call a function to register our importer. I'm not a fan of import time side effects and I initially attempted to do this. However, I was foiled by our own test harness, which has numerous `python` invoked scripts that "import mercurial" and fail because the importer isn't registered. Realizing this problem is probably present in random Python scripts that have been written over the years, I decided that sacrificing purity for backwards compatibility is necessary. Plus, if you are programming Python, "import" should probably "just work." It's worth noting that now that we have a custom module loader, it would be possible to hook up demand module proxies at this level instead of replacing __import__. We leave this work for another time, if it's even desired. This patch breaks importing in environments where Mercurial modules are loaded from a zip file (such as py2exe distributions). This will be addressed in a subsequent patch.

"Fixer that translates some APIs ignored by the default 2to3 fixers."

# FIXME: This fixer has some ugly hacks. Its main design is based on that of
# fix_imports, from lib2to3. Unfortunately, the fix_imports framework only
# changes module names "without dots", meaning it won't work for some changes
# in the email module/package. Thus this fixer was born. I believe that with a
# bit more thinking, a more generic fixer can be implemented, but I'll leave
# that as future work.

from lib2to3.fixer_util import Name
from lib2to3.fixes import fix_imports

# This maps the old names to the new names. Note that a drawback of the current
# design is that the dictionary keys MUST have EXACTLY one dot (.) in them,
# otherwise things will break. (If you don't need a module hierarchy, you're
# better of just inherit from fix_imports and overriding the MAPPING dict.)

MAPPING = {'email.Utils': 'email.utils',
           'email.Errors': 'email.errors',
           'email.Header': 'email.header',
           'email.Parser': 'email.parser',
           'email.Encoders': 'email.encoders',
           'email.MIMEText': 'email.mime.text',
           'email.MIMEBase': 'email.mime.base',
           'email.Generator': 'email.generator',
           'email.MIMEMultipart': 'email.mime.multipart',
}

def alternates(members):
    return "(" + "|".join(map(repr, members)) + ")"

def build_pattern(mapping=MAPPING):
    packages = {}
    for key in mapping:
        # What we are doing here is the following: with dotted names, we'll
        # have something like package_name <trailer '.' module>. Then, we are
        # making a dictionary to copy this structure. For example, if
        # mapping={'A.B': 'a.b', 'A.C': 'a.c'}, it will generate the dictionary
        # {'A': ['b', 'c']} to, then, generate something like "A <trailer '.'
        # ('b' | 'c')".
        name = key.split('.')
        prefix = name[0]
        if prefix in packages:
            packages[prefix].append(name[1:][0])
        else:
            packages[prefix] = name[1:]

    mod_list = ' | '.join(["'%s' '.' ('%s')" %
        (key, "' | '".join(packages[key])) for key in packages])
    mod_list = '(' + mod_list + ' )'

    yield """name_import=import_name< 'import' module_name=dotted_name< %s > >
          """ % mod_list

    yield """name_import=import_name< 'import'
            multiple_imports=dotted_as_names< any*
            module_name=dotted_name< %s >
            any* >
            >""" % mod_list

    packs = ' | '.join(["'%s' trailer<'.' ('%s')>" % (key,
               "' | '".join(packages[key])) for key in packages])

    yield "power< package=(%s) trailer<'.' any > any* >" % packs

class FixLeftoverImports(fix_imports.FixImports):
    # We want to run this fixer after fix_import has run (this shouldn't matter
    # for hg, though, as setup3k prefers to run the default fixers first)
    mapping = MAPPING

    def build_pattern(self):
        return "|".join(build_pattern(self.mapping))

    def transform(self, node, results):
        # Mostly copied from fix_imports.py
        import_mod = results.get("module_name")
        if import_mod:
            try:
                mod_name = import_mod.value
            except AttributeError:
                # XXX: A hack to remove whitespace prefixes and suffixes
                mod_name = str(import_mod).strip()
            new_name = self.mapping[mod_name]
            import_mod.replace(Name(new_name, prefix=import_mod.prefix))
            if "name_import" in results:
                # If it's not a "from x import x, y" or "import x as y" import,
                # marked its usage to be replaced.
                self.replace[mod_name] = new_name
            if "multiple_imports" in results:
                # This is a nasty hack to fix multiple imports on a line (e.g.,
                # "import StringIO, urlparse"). The problem is that I can't
                # figure out an easy way to make a pattern recognize the keys of
                # MAPPING randomly sprinkled in an import statement.
                results = self.match(node)
                if results:
                    self.transform(node, results)
        else:
            # Replace usage of the module.
            # Now this is, mostly, a hack
            bare_name = results["package"][0]
            bare_name_text = ''.join(map(str, results['package'])).strip()
            new_name = self.replace.get(bare_name_text)
            prefix = results['package'][0].prefix
            if new_name:
                bare_name.replace(Name(new_name, prefix=prefix))
                results["package"][1].replace(Name(''))