mercurial: contrib/undumprevlog@7a3f6490ef97

highlight: add option to prevent content-only based fallback When Mozilla enabled Pygments on hg.mozilla.org, we got a lot of weirdly colorized files. Upon further investigation, the hightlight extension is first attempting a filename+content based match then falling back to a purely content-driven detection mode in Pygments. Sounds good in theory. Unfortunately, Pygments' content-driven detection establishes no minimum threshold for returning a lexer. Furthermore, the detection code for a number of languages is very liberal. For example, ActionScript 3 will return a confidence of 0.3 (out of 1.0) if the first 1k of the file we pass in matches the regex "\w+\s*:\s*\w"! Python matches on "import ". It's no coincidence that a number of our extension-less files were getting highlighted improperly. This patch adds an option to have the highlighter not fall back to purely content-based detection when filename+content detection failed. This can be enabled to render unlighted text instead of taking the risk that unknown file types are highlighted incorrectly. The old behavior is still the default.


#!/usr/bin/env python
# Undump a dump from dumprevlog
# $ hg init
# $ undumprevlog < repo.dump

import sys
from mercurial import revlog, node, scmutil, util, transaction

for fp in (sys.stdin, sys.stdout, sys.stderr):
    util.setbinary(fp)

opener = scmutil.opener('.', False)
tr = transaction.transaction(sys.stderr.write, opener, {'store': opener},
                             "undump.journal")
while True:
    l = sys.stdin.readline()
    if not l:
        break
    if l.startswith("file:"):
        f = l[6:-1]
        r = revlog.revlog(opener, f)
        print f
    elif l.startswith("node:"):
        n = node.bin(l[6:-1])
    elif l.startswith("linkrev:"):
        lr = int(l[9:-1])
    elif l.startswith("parents:"):
        p = l[9:-1].split()
        p1 = node.bin(p[0])
        p2 = node.bin(p[1])
    elif l.startswith("length:"):
        length = int(l[8:-1])
        sys.stdin.readline() # start marker
        d = sys.stdin.read(length)
        sys.stdin.readline() # end marker
        r.addrevision(d, tr, lr, p1, p2)

tr.close()

author	Gregory Szorc <gregory.szorc@gmail.com>
	Wed, 14 Oct 2015 18:22:16 -0700
changeset 26680	7a3f6490ef97
parent 23310	5bd1f6572db0
child 29167	4f76c0c490b3
permissions	-rwxr-xr-x