highlight: add option to prevent content-only based fallback
When Mozilla enabled Pygments on hg.mozilla.org, we got a lot of weirdly
colorized files. Upon further investigation, the hightlight extension
is first attempting a filename+content based match then falling back to a
purely content-driven detection mode in Pygments. Sounds good in theory.
Unfortunately, Pygments' content-driven detection establishes no minimum
threshold for returning a lexer. Furthermore, the detection code for
a number of languages is very liberal. For example, ActionScript 3 will
return a confidence of 0.3 (out of 1.0) if the first 1k of the file
we pass in matches the regex "\w+\s*:\s*\w"! Python matches on
"import ". It's no coincidence that a number of our extension-less files
were getting highlighted improperly.
This patch adds an option to have the highlighter not fall back to
purely content-based detection when filename+content detection failed.
This can be enabled to render unlighted text instead of taking the risk
that unknown file types are highlighted incorrectly. The old behavior is
still the default.
#!/bin/bash -e
. $(dirname $0)/dockerlib.sh
BUILDDIR=$(dirname $0)
export ROOTDIR=$(cd $BUILDDIR/..; pwd)
checkdocker
PLATFORM="$1"
shift # extra params are passed to buildrpm
initcontainer $PLATFORM
RPMBUILDDIR=$ROOTDIR/packages/$PLATFORM
contrib/buildrpm --rpmbuilddir $RPMBUILDDIR --prepare $*
DSHARED=/mnt/shared
$DOCKER run -u $DBUILDUSER --rm -v $RPMBUILDDIR:$DSHARED $CONTAINER \
rpmbuild --define "_topdir $DSHARED" -ba $DSHARED/SPECS/mercurial.spec --clean
$DOCKER run -u $DBUILDUSER --rm -v $RPMBUILDDIR:$DSHARED $CONTAINER \
createrepo $DSHARED
cat << EOF > $RPMBUILDDIR/mercurial.repo
# Place this file in /etc/yum.repos.d/mercurial.repo
[mercurial]
name=Mercurial packages for $PLATFORM
# baseurl=file://$RPMBUILDDIR/
baseurl=http://hg.example.com/build/$PLATFORM/
skip_if_unavailable=True
gpgcheck=0
enabled=1
EOF
echo
echo "Build complete - results can be found in $RPMBUILDDIR"