highlight: add option to prevent content-only based fallback
When Mozilla enabled Pygments on hg.mozilla.org, we got a lot of weirdly
colorized files. Upon further investigation, the hightlight extension
is first attempting a filename+content based match then falling back to a
purely content-driven detection mode in Pygments. Sounds good in theory.
Unfortunately, Pygments' content-driven detection establishes no minimum
threshold for returning a lexer. Furthermore, the detection code for
a number of languages is very liberal. For example, ActionScript 3 will
return a confidence of 0.3 (out of 1.0) if the first 1k of the file
we pass in matches the regex "\w+\s*:\s*\w"! Python matches on
"import ". It's no coincidence that a number of our extension-less files
were getting highlighted improperly.
This patch adds an option to have the highlighter not fall back to
purely content-based detection when filename+content detection failed.
This can be enabled to render unlighted text instead of taking the risk
that unknown file types are highlighted incorrectly. The old behavior is
still the default.
[This file is here for historical purposes, all recent contributors
should appear in the changelog directly]
Andrea Arcangeli <andrea at suse.de>
Thomas Arendsen Hein <thomas at intevation.de>
Goffredo Baroncelli <kreijack at libero.it>
Muli Ben-Yehuda <mulix at mulix.org>
Mikael Berthe <mikael at lilotux.net>
Benoit Boissinot <bboissin at gmail.com>
Brendan Cully <brendan at kublai.com>
Vincent Danjean <vdanjean.ml at free.fr>
Jake Edge <jake at edge2.net>
Michael Fetterman <michael.fetterman at intel.com>
Edouard Gomez <ed.gomez at free.fr>
Eric Hopper <hopper at omnifarious.org>
Alecs King <alecsk at gmail.com>
Volker Kleinfeld <Volker.Kleinfeld at gmx.de>
Vadim Lebedev <vadim at mbdsys.com>
Christopher Li <hg at chrisli.org>
Chris Mason <mason at suse.com>
Colin McMillen <mcmillen at cs.cmu.edu>
Wojciech Milkowski <wmilkowski at interia.pl>
Chad Netzer <chad.netzer at gmail.com>
Bryan O'Sullivan <bos at serpentine.com>
Vicent SeguĂ Pascual <vseguip at gmail.com>
Sean Perry <shaleh at speakeasy.net>
Nguyen Anh Quynh <aquynh at gmail.com>
Ollivier Robert <roberto at keltia.freenix.fr>
Alexander Schremmer <alex at alexanderweb.de>
Arun Sharma <arun at sharma-home.net>
Josef "Jeff" Sipek <jeffpc at optonline.net>
Kevin Smith <yarcs at qualitycode.com>
TK Soh <teekaysoh at yahoo.com>
Radoslaw Szkodzinski <astralstorm at gorzow.mm.pl>
Samuel Tardieu <sam at rfc1149.net>
K Thananchayan <thananck at yahoo.com>
Andrew Thompson <andrewkt at aktzero.com>
Michael S. Tsirkin <mst at mellanox.co.il>
Rafael Villar Burke <pachi at mmn-arquitectos.com>
Tristan Wibberley <tristan at wibberley.org>
Mark Williamson <mark.williamson at cl.cam.ac.uk>