hgext/censor.py
author Gregory Szorc <gregory.szorc@gmail.com>
Tue, 18 Sep 2018 17:51:43 -0700
changeset 39778 a6b3c4c1019f
parent 39661 8bfbb25859f1
child 40293 c303d65d2e34
permissions -rw-r--r--
revlog: move censor logic out of censor extension The censor extension is doing very low-level things with revlogs. It is fundamentally impossible for this logic to remain in the censor extension while support multiple storage backends: we need each storage backend to implement censor in its own storage-specific way. This commit effectively moves the revlog-specific censoring code to be a method of revlogs themselves. We've defined a new API on the file storage interface for censoring an individual node. Even though the current censoring code doesn't use it, the API requires a transaction instance because it logically makes sense for storage backends to require an active transaction (which implies a held write lock) in order to rewrite storage. After this commit, the censor extension has been reduced to boilerplate precondition checking before invoking the generic storage API. I tried to keep the code as similar as possible. But some minor changes were made: * We use self._io instead of instantiating a new revlogio instance. * We compare self.version against REVLOGV0 instead of != REVLOGV1 because presumably all future revlog versions will support censoring. * We use self.opener instead of going through repo.svfs (we don't have a handle on the repo instance from a revlog). * "revlog" dropped * Replace "flog" with "self". Differential Revision: https://phab.mercurial-scm.org/D4656
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
     1
# Copyright (C) 2015 - Mike Edgar <adgar@google.com>
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
     2
#
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
     3
# This extension enables removal of file content at a given revision,
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
     4
# rewriting the data/metadata of successive revisions to preserve revision log
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
     5
# integrity.
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
     6
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
     7
"""erase file content at a given revision
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
     8
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
     9
The censor command instructs Mercurial to erase all content of a file at a given
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    10
revision *without updating the changeset hash.* This allows existing history to
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    11
remain valid while preventing future clones/pulls from receiving the erased
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    12
data.
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    13
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    14
Typical uses for censor are due to security or legal requirements, including::
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    15
26781
1aee2ab0f902 spelling: trivial spell checking
Mads Kiilerich <madski@unity3d.com>
parents: 26587
diff changeset
    16
 * Passwords, private keys, cryptographic material
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    17
 * Licensed data/code/libraries for which the license has expired
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    18
 * Personally Identifiable Information or other private data
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    19
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    20
Censored nodes can interrupt mercurial's typical operation whenever the excised
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    21
data needs to be materialized. Some commands, like ``hg cat``/``hg revert``,
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    22
simply fail when asked to produce censored data. Others, like ``hg verify`` and
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    23
``hg update``, must be capable of tolerating censored data to continue to
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    24
function in a meaningful way. Such commands only tolerate censored file
24890
cba84b06b702 censor: fix incorrect configuration name for ignoring error at censored file
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 24880
diff changeset
    25
revisions if they are allowed by the "censor.policy=ignore" config option.
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    26
"""
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    27
28092
5166b7a84b72 censor: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27290
diff changeset
    28
from __future__ import absolute_import
5166b7a84b72 censor: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27290
diff changeset
    29
5166b7a84b72 censor: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27290
diff changeset
    30
from mercurial.i18n import _
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    31
from mercurial.node import short
28092
5166b7a84b72 censor: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27290
diff changeset
    32
5166b7a84b72 censor: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27290
diff changeset
    33
from mercurial import (
5166b7a84b72 censor: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27290
diff changeset
    34
    error,
32337
46ba2cdda476 registrar: move cmdutil.command to registrar module (API)
Yuya Nishihara <yuya@tcha.org>
parents: 32315
diff changeset
    35
    registrar,
28092
5166b7a84b72 censor: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27290
diff changeset
    36
    scmutil,
5166b7a84b72 censor: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27290
diff changeset
    37
)
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    38
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    39
cmdtable = {}
32337
46ba2cdda476 registrar: move cmdutil.command to registrar module (API)
Yuya Nishihara <yuya@tcha.org>
parents: 32315
diff changeset
    40
command = registrar.command(cmdtable)
29841
d5883fd055c6 extensions: change magic "shipped with hg" string
Augie Fackler <augie@google.com>
parents: 28092
diff changeset
    41
# Note for extension authors: ONLY specify testedwith = 'ships-with-hg-core' for
25186
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 24890
diff changeset
    42
# extensions which SHIP WITH MERCURIAL. Non-mainline extensions should
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 24890
diff changeset
    43
# be specifying the version(s) of Mercurial they are tested with, or
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 24890
diff changeset
    44
# leave the attribute unspecified.
29841
d5883fd055c6 extensions: change magic "shipped with hg" string
Augie Fackler <augie@google.com>
parents: 28092
diff changeset
    45
testedwith = 'ships-with-hg-core'
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    46
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    47
@command('censor',
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    48
    [('r', 'rev', '', _('censor file from specified revision'), _('REV')),
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    49
     ('t', 'tombstone', '', _('replacement tombstone data'), _('TEXT'))],
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    50
    _('-r REV [-t TEXT] [FILE]'))
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    51
def censor(ui, repo, path, rev='', tombstone='', **opts):
38441
e219e355e088 censor: use context manager for lock management
Matt Harbison <matt_harbison@yahoo.com>
parents: 37442
diff changeset
    52
    with repo.wlock(), repo.lock():
27290
525d9b3f0a31 censor: make censor acquire locks before processing
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 26781
diff changeset
    53
        return _docensor(ui, repo, path, rev, tombstone, **opts)
525d9b3f0a31 censor: make censor acquire locks before processing
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 26781
diff changeset
    54
525d9b3f0a31 censor: make censor acquire locks before processing
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 26781
diff changeset
    55
def _docensor(ui, repo, path, rev='', tombstone='', **opts):
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    56
    if not path:
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25806
diff changeset
    57
        raise error.Abort(_('must specify file path to censor'))
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    58
    if not rev:
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25806
diff changeset
    59
        raise error.Abort(_('must specify revision to censor'))
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    60
25806
5e18f6e39006 censor: make various path forms available like other Mercurial commands
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 25660
diff changeset
    61
    wctx = repo[None]
5e18f6e39006 censor: make various path forms available like other Mercurial commands
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 25660
diff changeset
    62
5e18f6e39006 censor: make various path forms available like other Mercurial commands
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 25660
diff changeset
    63
    m = scmutil.match(wctx, (path,))
5e18f6e39006 censor: make various path forms available like other Mercurial commands
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 25660
diff changeset
    64
    if m.anypats() or len(m.files()) != 1:
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25806
diff changeset
    65
        raise error.Abort(_('can only specify an explicit filename'))
25806
5e18f6e39006 censor: make various path forms available like other Mercurial commands
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 25660
diff changeset
    66
    path = m.files()[0]
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    67
    flog = repo.file(path)
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    68
    if not len(flog):
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25806
diff changeset
    69
        raise error.Abort(_('cannot censor file with no history'))
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    70
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    71
    rev = scmutil.revsingle(repo, rev, rev).rev()
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    72
    try:
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    73
        ctx = repo[rev]
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    74
    except KeyError:
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25806
diff changeset
    75
        raise error.Abort(_('invalid revision identifier %s') % rev)
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    76
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    77
    try:
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    78
        fctx = ctx.filectx(path)
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    79
    except error.LookupError:
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25806
diff changeset
    80
        raise error.Abort(_('file does not exist at revision %s') % rev)
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    81
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    82
    fnode = fctx.filenode()
39615
a658f97c1ce4 censor: use a reasonable amount of memory
Valentin Gatien-Baron <vgatien-baron@janestreet.com>
parents: 38783
diff changeset
    83
    heads = []
a658f97c1ce4 censor: use a reasonable amount of memory
Valentin Gatien-Baron <vgatien-baron@janestreet.com>
parents: 38783
diff changeset
    84
    for headnode in repo.heads():
39661
8bfbb25859f1 censor: rename loop variable to silence pyflakes warning
Yuya Nishihara <yuya@tcha.org>
parents: 39615
diff changeset
    85
        hc = repo[headnode]
8bfbb25859f1 censor: rename loop variable to silence pyflakes warning
Yuya Nishihara <yuya@tcha.org>
parents: 39615
diff changeset
    86
        if path in hc and hc.filenode(path) == fnode:
8bfbb25859f1 censor: rename loop variable to silence pyflakes warning
Yuya Nishihara <yuya@tcha.org>
parents: 39615
diff changeset
    87
            heads.append(hc)
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    88
    if heads:
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    89
        headlist = ', '.join([short(c.node()) for c in heads])
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25806
diff changeset
    90
        raise error.Abort(_('cannot censor file in heads (%s)') % headlist,
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    91
            hint=_('clean/delete and commit first'))
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    92
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    93
    wp = wctx.parents()
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    94
    if ctx.node() in [p.node() for p in wp]:
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25806
diff changeset
    95
        raise error.Abort(_('cannot censor working directory'),
24347
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    96
            hint=_('clean/delete/update first'))
1bcfecbbf569 censor: add censor command to hgext with basic client-side tests
Mike Edgar <adgar@google.com>
parents:
diff changeset
    97
39778
a6b3c4c1019f revlog: move censor logic out of censor extension
Gregory Szorc <gregory.szorc@gmail.com>
parents: 39661
diff changeset
    98
    with repo.transaction(b'censor') as tr:
a6b3c4c1019f revlog: move censor logic out of censor extension
Gregory Szorc <gregory.szorc@gmail.com>
parents: 39661
diff changeset
    99
        flog.censorrevision(tr, fnode, tombstone=tombstone)