hgext/highlight/__init__.py
author Gregory Szorc <gregory.szorc@gmail.com>
Wed, 14 Oct 2015 18:22:16 -0700
changeset 26680 7a3f6490ef97
parent 26679 0d93df4d1e44
child 29485 6a98f9408a50
permissions -rw-r--r--
highlight: add option to prevent content-only based fallback When Mozilla enabled Pygments on hg.mozilla.org, we got a lot of weirdly colorized files. Upon further investigation, the hightlight extension is first attempting a filename+content based match then falling back to a purely content-driven detection mode in Pygments. Sounds good in theory. Unfortunately, Pygments' content-driven detection establishes no minimum threshold for returning a lexer. Furthermore, the detection code for a number of languages is very liberal. For example, ActionScript 3 will return a confidence of 0.3 (out of 1.0) if the first 1k of the file we pass in matches the regex "\w+\s*:\s*\w"! Python matches on "import ". It's no coincidence that a number of our extension-less files were getting highlighted improperly. This patch adds an option to have the highlighter not fall back to purely content-based detection when filename+content detection failed. This can be enabled to render unlighted text instead of taking the risk that unknown file types are highlighted incorrectly. The old behavior is still the default.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
8251
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
     1
# highlight - syntax highlighting in hgweb, based on Pygments
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
     2
#
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
     3
#  Copyright 2008, 2009 Patrick Mezard <pmezard@gmail.com> and others
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
     4
#
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
     5
# This software may be used and distributed according to the terms of the
10263
25e572394f5c Update license to GPLv2+
Matt Mackall <mpm@selenic.com>
parents: 9409
diff changeset
     6
# GNU General Public License version 2 or any later version.
8251
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
     7
#
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
     8
# The original module was split in an interface and an implementation
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
     9
# file to defer pygments loading and speedup extension setup.
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
    10
8932
f87884329419 extensions: fix up description lines some more
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 8894
diff changeset
    11
"""syntax highlighting for hgweb (requires Pygments)
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    12
9262
917e1d5674d6 highlight: wrap docstrings at 70 characters
Martin Geisler <mg@lazybytes.net>
parents: 9210
diff changeset
    13
It depends on the Pygments syntax highlighting library:
917e1d5674d6 highlight: wrap docstrings at 70 characters
Martin Geisler <mg@lazybytes.net>
parents: 9210
diff changeset
    14
http://pygments.org/
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    15
26680
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
    16
There are the following configuration options::
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    17
9210
2667ca525b59 highlight: use reST syntax for literal block
Martin Geisler <mg@lazybytes.net>
parents: 9064
diff changeset
    18
  [web]
26249
3166bcc0c538 highlight: add highlightfiles config option which takes a fileset (issue3005)
Anton Shestakov <av6@dwimlabs.net>
parents: 25602
diff changeset
    19
  pygments_style = <style> (default: colorful)
3166bcc0c538 highlight: add highlightfiles config option which takes a fileset (issue3005)
Anton Shestakov <av6@dwimlabs.net>
parents: 25602
diff changeset
    20
  highlightfiles = <fileset> (default: size('<5M'))
26680
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
    21
  highlightonlymatchfilename = <bool> (default False)
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
    22
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
    23
``highlightonlymatchfilename`` will only highlight files if their type could
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
    24
be identified by their filename. When this is not enabled (the default),
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
    25
Pygments will try very hard to identify the file type from content and any
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
    26
match (even matches with a low confidence score) will be used.
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    27
"""
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    28
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    29
import highlight
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    30
from mercurial.hgweb import webcommands, webutil, common
26249
3166bcc0c538 highlight: add highlightfiles config option which takes a fileset (issue3005)
Anton Shestakov <av6@dwimlabs.net>
parents: 25602
diff changeset
    31
from mercurial import extensions, encoding, fileset
25186
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 19872
diff changeset
    32
# Note for extension authors: ONLY specify testedwith = 'internal' for
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 19872
diff changeset
    33
# extensions which SHIP WITH MERCURIAL. Non-mainline extensions should
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 19872
diff changeset
    34
# be specifying the version(s) of Mercurial they are tested with, or
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 19872
diff changeset
    35
# leave the attribute unspecified.
16743
38caf405d010 hgext: mark all first-party extensions as such
Augie Fackler <raf@durin42.com>
parents: 16683
diff changeset
    36
testedwith = 'internal'
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    37
26679
0d93df4d1e44 highlight: inline checkfctx()
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26678
diff changeset
    38
def pygmentize(web, field, fctx, tmpl):
0d93df4d1e44 highlight: inline checkfctx()
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26678
diff changeset
    39
    style = web.config('web', 'pygments_style', 'colorful')
0d93df4d1e44 highlight: inline checkfctx()
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26678
diff changeset
    40
    expr = web.config('web', 'highlightfiles', "size('<5M')")
26680
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
    41
    filenameonly = web.configbool('web', 'highlightonlymatchfilename', False)
26679
0d93df4d1e44 highlight: inline checkfctx()
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26678
diff changeset
    42
26249
3166bcc0c538 highlight: add highlightfiles config option which takes a fileset (issue3005)
Anton Shestakov <av6@dwimlabs.net>
parents: 25602
diff changeset
    43
    ctx = fctx.changectx()
3166bcc0c538 highlight: add highlightfiles config option which takes a fileset (issue3005)
Anton Shestakov <av6@dwimlabs.net>
parents: 25602
diff changeset
    44
    tree = fileset.parse(expr)
3166bcc0c538 highlight: add highlightfiles config option which takes a fileset (issue3005)
Anton Shestakov <av6@dwimlabs.net>
parents: 25602
diff changeset
    45
    mctx = fileset.matchctx(ctx, subset=[fctx.path()], status=None)
26679
0d93df4d1e44 highlight: inline checkfctx()
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26678
diff changeset
    46
    if fctx.path() in fileset.getset(mctx, tree):
26680
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
    47
        highlight.pygmentize(field, fctx, style, tmpl,
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
    48
                guessfilenameonly=filenameonly)
26678
613d850cce53 highlight: consolidate duplicate code
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26295
diff changeset
    49
25602
85fb416f2fa7 hgweb: provide symrev (symbolic revision) property to the templates
Anton Shestakov <av6@dwimlabs.net>
parents: 25186
diff changeset
    50
def filerevision_highlight(orig, web, req, tmpl, fctx):
8874
74baf78202e8 highlight: was broken since 580a79dde2a3 (encoding)
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 8866
diff changeset
    51
    mt = ''.join(tmpl('mimetype', encoding=encoding.encoding))
6987
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
    52
    # only pygmentize for mimetype containing 'html' so we both match
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
    53
    # 'text/html' and possibly 'application/xhtml+xml' in the future
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
    54
    # so that we don't have to touch the extension when the mimetype
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
    55
    # for a template changes; also hgweb optimizes the case that a
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
    56
    # raw file is sent using rawfile() and doesn't call us, so we
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
    57
    # can't clash with the file's content-type here in case we
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
    58
    # pygmentize a html file
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
    59
    if 'html' in mt:
26678
613d850cce53 highlight: consolidate duplicate code
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26295
diff changeset
    60
        pygmentize(web, 'fileline', fctx, tmpl)
613d850cce53 highlight: consolidate duplicate code
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26295
diff changeset
    61
25602
85fb416f2fa7 hgweb: provide symrev (symbolic revision) property to the templates
Anton Shestakov <av6@dwimlabs.net>
parents: 25186
diff changeset
    62
    return orig(web, req, tmpl, fctx)
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    63
7216
292fb2ad2846 extensions: use new wrapper functions
Matt Mackall <mpm@selenic.com>
parents: 7127
diff changeset
    64
def annotate_highlight(orig, web, req, tmpl):
8874
74baf78202e8 highlight: was broken since 580a79dde2a3 (encoding)
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 8866
diff changeset
    65
    mt = ''.join(tmpl('mimetype', encoding=encoding.encoding))
6987
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
    66
    if 'html' in mt:
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
    67
        fctx = webutil.filectx(web.repo, req)
26678
613d850cce53 highlight: consolidate duplicate code
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26295
diff changeset
    68
        pygmentize(web, 'annotateline', fctx, tmpl)
613d850cce53 highlight: consolidate duplicate code
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26295
diff changeset
    69
7216
292fb2ad2846 extensions: use new wrapper functions
Matt Mackall <mpm@selenic.com>
parents: 7127
diff changeset
    70
    return orig(web, req, tmpl)
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    71
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    72
def generate_css(web, req, tmpl):
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    73
    pg_style = web.config('web', 'pygments_style', 'colorful')
19872
681f7b9213a4 check-code: check for spaces around = for named parameters
Mads Kiilerich <madski@unity3d.com>
parents: 16743
diff changeset
    74
    fmter = highlight.HtmlFormatter(style=pg_style)
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    75
    req.respond(common.HTTP_OK, 'text/css')
16683
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 10263
diff changeset
    76
    return ['/* pygments_style = %s */\n\n' % pg_style,
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 10263
diff changeset
    77
            fmter.get_style_defs('')]
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
    78
9409
57157a224037 highlight: move code from module top-level into extsetup
Martin Geisler <mg@lazybytes.net>
parents: 9262
diff changeset
    79
def extsetup():
57157a224037 highlight: move code from module top-level into extsetup
Martin Geisler <mg@lazybytes.net>
parents: 9262
diff changeset
    80
    # monkeypatch in the new version
16683
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 10263
diff changeset
    81
    extensions.wrapfunction(webcommands, '_filerevision',
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 10263
diff changeset
    82
                            filerevision_highlight)
9409
57157a224037 highlight: move code from module top-level into extsetup
Martin Geisler <mg@lazybytes.net>
parents: 9262
diff changeset
    83
    extensions.wrapfunction(webcommands, 'annotate', annotate_highlight)
57157a224037 highlight: move code from module top-level into extsetup
Martin Geisler <mg@lazybytes.net>
parents: 9262
diff changeset
    84
    webcommands.highlightcss = generate_css
57157a224037 highlight: move code from module top-level into extsetup
Martin Geisler <mg@lazybytes.net>
parents: 9262
diff changeset
    85
    webcommands.__all__.append('highlightcss')