highlight: add option to prevent content-only based fallback
When Mozilla enabled Pygments on hg.mozilla.org, we got a lot of weirdly
colorized files. Upon further investigation, the hightlight extension
is first attempting a filename+content based match then falling back to a
purely content-driven detection mode in Pygments. Sounds good in theory.
Unfortunately, Pygments' content-driven detection establishes no minimum
threshold for returning a lexer. Furthermore, the detection code for
a number of languages is very liberal. For example, ActionScript 3 will
return a confidence of 0.3 (out of 1.0) if the first 1k of the file
we pass in matches the regex "\w+\s*:\s*\w"! Python matches on
"import ". It's no coincidence that a number of our extension-less files
were getting highlighted improperly.
This patch adds an option to have the highlighter not fall back to
purely content-based detection when filename+content detection failed.
This can be enabled to render unlighted text instead of taking the risk
that unknown file types are highlighted incorrectly. The old behavior is
still the default.
#!/usr/bin/env python
# Undump a dump from dumprevlog
# $ hg init
# $ undumprevlog < repo.dump
import sys
from mercurial import revlog, node, scmutil, util, transaction
for fp in (sys.stdin, sys.stdout, sys.stderr):
util.setbinary(fp)
opener = scmutil.opener('.', False)
tr = transaction.transaction(sys.stderr.write, opener, {'store': opener},
"undump.journal")
while True:
l = sys.stdin.readline()
if not l:
break
if l.startswith("file:"):
f = l[6:-1]
r = revlog.revlog(opener, f)
print f
elif l.startswith("node:"):
n = node.bin(l[6:-1])
elif l.startswith("linkrev:"):
lr = int(l[9:-1])
elif l.startswith("parents:"):
p = l[9:-1].split()
p1 = node.bin(p[0])
p2 = node.bin(p[1])
elif l.startswith("length:"):
length = int(l[8:-1])
sys.stdin.readline() # start marker
d = sys.stdin.read(length)
sys.stdin.readline() # end marker
r.addrevision(d, tr, lr, p1, p2)
tr.close()