rust/hg-core/src/operations/cat.rs
author Arseniy Alekseyev <aalekseyev@janestreet.com>
Tue, 05 Oct 2021 15:10:42 +0100
changeset 48225 0cc69017d47f
parent 48224 6b5773f89183
child 48234 1837663ac216
permissions -rw-r--r--
rhg: stop manifest traversal when no more files are needed Stopping the traversal early can skip a significant part of the manifest traversal, to avoid some of its cost. The worst-case benchmarks are favorable, as well. Running [hg cat] on the last file in the manifest of a large repo, I'm seeing a ~4ms improvement (150ms -> 146ms), so this time is now almost indistinguishable from the baseline ("brute force") implementation. Running [hg cat] on ~220 files together with the last file of the repo is further improved by ~5ms or so. I suspect the raw performance improvements are caused by splitting the manifest search and the file data access into separate phases, instead of interleaving them. Differential Revision: https://phab.mercurial-scm.org/D11616
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
45541
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
     1
// list_tracked_files.rs
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
     2
//
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
     3
// Copyright 2020 Antoine Cezar <antoine.cezar@octobus.net>
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
     4
//
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
     5
// This software may be used and distributed according to the terms of the
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
     6
// GNU General Public License version 2 or any later version.
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
     7
46167
8a4914397d02 rust: introduce Repo and Vfs types for filesystem abstraction
Simon Sapin <simon.sapin@octobus.net>
parents: 46135
diff changeset
     8
use crate::repo::Repo;
45541
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
     9
use crate::revlog::revlog::RevlogError;
46033
88e741bf2d93 rust: use NodePrefix::from_hex instead of hex::decode directly
Simon Sapin <simon-commits@exyr.org>
parents: 46032
diff changeset
    10
use crate::revlog::Node;
47961
4d2a5ca060e3 rust: Add a Filelog struct that wraps Revlog
Simon Sapin <simon.sapin@octobus.net>
parents: 47960
diff changeset
    11
48225
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    12
use crate::utils::hg_path::HgPath;
47961
4d2a5ca060e3 rust: Add a Filelog struct that wraps Revlog
Simon Sapin <simon.sapin@octobus.net>
parents: 47960
diff changeset
    13
use crate::utils::hg_path::HgPathBuf;
45541
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
    14
48225
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    15
use itertools::put_back;
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    16
use itertools::PutBack;
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    17
use std::cmp::Ordering;
48224
6b5773f89183 rhg: faster hg cat when many files are requested
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 47969
diff changeset
    18
46744
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    19
pub struct CatOutput {
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    20
    /// Whether any file in the manifest matched the paths given as CLI
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    21
    /// arguments
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    22
    pub found_any: bool,
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    23
    /// The contents of matching files, in manifest order
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    24
    pub concatenated: Vec<u8>,
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    25
    /// Which of the CLI arguments did not match any manifest file
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    26
    pub missing: Vec<HgPathBuf>,
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    27
    /// The node ID that the given revset was resolved to
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    28
    pub node: Node,
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    29
}
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    30
48225
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    31
// Find an item in an iterator over a sorted collection.
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    32
fn find_item<'a, 'b, 'c, D, I: Iterator<Item = (&'a HgPath, D)>>(
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    33
    i: &mut PutBack<I>,
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    34
    needle: &'b HgPath,
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    35
) -> Option<I::Item> {
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    36
    loop {
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    37
        match i.next() {
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    38
            None => return None,
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    39
            Some(val) => match needle.as_bytes().cmp(val.0.as_bytes()) {
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    40
                Ordering::Less => {
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    41
                    i.put_back(val);
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    42
                    return None;
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    43
                }
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    44
                Ordering::Greater => continue,
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    45
                Ordering::Equal => return Some(val),
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    46
            },
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    47
        }
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    48
    }
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    49
}
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    50
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    51
fn find_files_in_manifest<
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    52
    'a,
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    53
    'b,
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    54
    D,
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    55
    I: Iterator<Item = (&'a HgPath, D)>,
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    56
    J: Iterator<Item = &'b HgPath>,
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    57
>(
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    58
    manifest: I,
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    59
    files: J,
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    60
) -> (Vec<(&'a HgPath, D)>, Vec<&'b HgPath>) {
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    61
    let mut manifest = put_back(manifest);
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    62
    let mut res = vec![];
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    63
    let mut missing = vec![];
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    64
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    65
    for file in files {
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    66
        match find_item(&mut manifest, file) {
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    67
            None => missing.push(file),
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    68
            Some(item) => res.push(item),
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    69
        }
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    70
    }
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    71
    return (res, missing);
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    72
}
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    73
46744
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    74
/// Output the given revision of files
46135
dca9cb99971c rust: replace most "operation" structs with functions
Simon Sapin <simon.sapin@octobus.net>
parents: 46134
diff changeset
    75
///
dca9cb99971c rust: replace most "operation" structs with functions
Simon Sapin <simon.sapin@octobus.net>
parents: 46134
diff changeset
    76
/// * `root`: Repository root
dca9cb99971c rust: replace most "operation" structs with functions
Simon Sapin <simon.sapin@octobus.net>
parents: 46134
diff changeset
    77
/// * `rev`: The revision to cat the files from.
dca9cb99971c rust: replace most "operation" structs with functions
Simon Sapin <simon.sapin@octobus.net>
parents: 46134
diff changeset
    78
/// * `files`: The files to output.
46744
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    79
pub fn cat<'a>(
46167
8a4914397d02 rust: introduce Repo and Vfs types for filesystem abstraction
Simon Sapin <simon.sapin@octobus.net>
parents: 46135
diff changeset
    80
    repo: &Repo,
46433
4b381dbbf8b7 rhg: centralize parsing of `--rev` CLI arguments
Simon Sapin <simon.sapin@octobus.net>
parents: 46431
diff changeset
    81
    revset: &str,
48224
6b5773f89183 rhg: faster hg cat when many files are requested
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 47969
diff changeset
    82
    mut files: Vec<HgPathBuf>,
46744
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    83
) -> Result<CatOutput, RevlogError> {
46433
4b381dbbf8b7 rhg: centralize parsing of `--rev` CLI arguments
Simon Sapin <simon.sapin@octobus.net>
parents: 46431
diff changeset
    84
    let rev = crate::revset::resolve_single(revset, repo)?;
47964
796206e74b10 rhg: Reuse manifest when checking status of multiple ambiguous files
Simon Sapin <simon.sapin@octobus.net>
parents: 47961
diff changeset
    85
    let manifest = repo.manifest_for_rev(rev)?;
47960
cfb6e6699b25 rust: Add Repo::manifest(revision)
Simon Sapin <simon.sapin@octobus.net>
parents: 47959
diff changeset
    86
    let node = *repo
cfb6e6699b25 rust: Add Repo::manifest(revision)
Simon Sapin <simon.sapin@octobus.net>
parents: 47959
diff changeset
    87
        .changelog()?
46744
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    88
        .node_from_rev(rev)
47960
cfb6e6699b25 rust: Add Repo::manifest(revision)
Simon Sapin <simon.sapin@octobus.net>
parents: 47959
diff changeset
    89
        .expect("should succeed when repo.manifest did");
48225
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    90
    let mut bytes: Vec<u8> = vec![];
46744
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
    91
    let mut found_any = false;
48225
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    92
48224
6b5773f89183 rhg: faster hg cat when many files are requested
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 47969
diff changeset
    93
    files.sort_unstable();
6b5773f89183 rhg: faster hg cat when many files are requested
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 47969
diff changeset
    94
48225
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    95
    let (found, missing) = find_files_in_manifest(
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    96
        manifest.files_with_nodes(),
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    97
        files.iter().map(|f| f.as_ref()),
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
    98
    );
45541
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
    99
48225
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
   100
    for (manifest_file, node_bytes) in found {
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
   101
        found_any = true;
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
   102
        let file_log = repo.filelog(manifest_file)?;
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
   103
        let file_node = Node::from_hex_for_repo(node_bytes)?;
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
   104
        bytes.extend(file_log.data_for_node(file_node)?.data()?);
45541
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
   105
    }
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
   106
48224
6b5773f89183 rhg: faster hg cat when many files are requested
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 47969
diff changeset
   107
    let missing: Vec<HgPathBuf> = missing
46744
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
   108
        .iter()
48225
0cc69017d47f rhg: stop manifest traversal when no more files are needed
Arseniy Alekseyev <aalekseyev@janestreet.com>
parents: 48224
diff changeset
   109
        .map(|file| (*file).to_owned())
46744
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
   110
        .collect();
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
   111
    Ok(CatOutput {
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
   112
        found_any,
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
   113
        concatenated: bytes,
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
   114
        missing,
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
   115
        node,
b1f2c2b336ec rhg: `cat` command: print error messages for missing files
Simon Sapin <simon.sapin@octobus.net>
parents: 46443
diff changeset
   116
    })
45541
522ec3dc44b9 hg-core: add a `CatRev` operation
Antoine Cezar <antoine.cezar@octobus.net>
parents:
diff changeset
   117
}