rust/hg-core/src/dirstate_tree/dirstate_map.rs
author Simon Sapin <simon.sapin@octobus.net>
Mon, 27 Sep 2021 12:09:15 +0200
changeset 48068 bf8837e3d7ce
parent 48061 060cd909439f
child 48069 3d0a9c6e614d
permissions -rw-r--r--
dirstate: Remove the flat Rust DirstateMap implementation Before this changeset we had two Rust implementations of `DirstateMap`. This removes the "flat" DirstateMap so that the "tree" DirstateMap is always used when Rust enabled. This simplifies the code a lot, and will enable (in the next changeset) further removal of a trait abstraction. This is a performance regression when: * Rust is enabled, and * The repository uses the legacy dirstate-v1 file format, and * For `hg status`, unknown files are not listed (such as with `-mard`) The regression is about 100 milliseconds for `hg status -mard` on a semi-large repository (mozilla-central), from ~320ms to ~420ms. We deem this to be small enough to be worth it. The new dirstate-v2 is still experimental at this point, but we aim to stabilize it (though not yet enable it by default for new repositories) in Mercurial 6.0. Eventually, upgrating repositories to dirsate-v2 will eliminate this regression (and enable other performance improvements). # Background The flat DirstateMap was introduced with the first Rust implementation of the status algorithm. It works similarly to the previous Python + C one, with a single `HashMap` that associates file paths to a `DirstateEntry` (where Python has a dict). We later added the tree DirstateMap where the root of the tree contains nodes for files and directories that are directly at the root of the repository, and nodes for directories can contain child nodes representing the files and directly that *they* contain directly. The shape of this tree mirrors that of the working directory in the filesystem. This enables the status algorithm to traverse this tree in tandem with traversing the filesystem tree, which in turns enables a more efficient algorithm. Furthermore, the new dirstate-v2 file format is also based on a tree of the same shape. The tree DirstateMap can access a dirstate-v2 file without parsing it: binary data in a single large (possibly memory-mapped) bytes buffer is traversed on demand. This allows `DirstateMap` creation to take `O(1)` time. (Mutation works by creating new in-memory nodes with copy-on-write semantics, and serialization is append-mostly.) The tradeoff is that for "legacy" repositories that use the dirstate-v1 file format, parsing that file into a tree DirstateMap takes more time. Profiling shows that this time is dominated by `HashMap`. For a dirstate containing `F` files with an average `D` directory depth, the flat DirstateMap does parsing in `O(F)` number of HashMap operations but the tree DirstateMap in `O(F × D)` operations, since each node has its own HashMap containing its child nodes. This slower costs ~140ms on an old snapshot of mozilla-central, and ~80ms on an old snapshot of the Netbeans repository. The status algorithm is faster, but with `-mard` (when not listing unknown files) it is typically not faster *enough* to compensate the slower parsing. Both Rust implementations are always faster than the Python + C implementation # Benchmark results All benchmarks are run on changeset 98c0408324e6, with repositories that use the dirstate-v1 file format, on a server with 4 CPU cores and 4 CPU threads (no HyperThreading). `hg status` benchmarks show wall clock times of the entire command as the average and standard deviation of serveral runs, collected by https://github.com/sharkdp/hyperfine and reformated. Parsing benchmarks are wall clock time of the Rust function that converts a bytes buffer of the dirstate file into the `DirstateMap` data structure as used by the status algorithm. A single run each, collected by running `hg status` this environment variable: RUST_LOG=hg::dirstate::dirstate_map=trace,hg::dirstate_tree::dirstate_map=trace Benchmark 1: Rust flat DirstateMap → Rust tree DirstateMap hg status mozilla-clean 562.3 ms ± 2.0 ms → 462.5 ms ± 0.6 ms 1.22 ± 0.00 times faster mozilla-dirty 859.6 ms ± 2.2 ms → 719.5 ms ± 3.2 ms 1.19 ± 0.01 times faster mozilla-ignored 558.2 ms ± 3.0 ms → 457.9 ms ± 2.9 ms 1.22 ± 0.01 times faster mozilla-unknowns 859.4 ms ± 5.7 ms → 716.0 ms ± 4.7 ms 1.20 ± 0.01 times faster netbeans-clean 336.5 ms ± 0.9 ms → 339.5 ms ± 0.4 ms 0.99 ± 0.00 times faster netbeans-dirty 491.4 ms ± 1.6 ms → 475.1 ms ± 1.2 ms 1.03 ± 0.00 times faster netbeans-ignored 343.7 ms ± 1.0 ms → 347.8 ms ± 0.4 ms 0.99 ± 0.00 times faster netbeans-unknowns 484.3 ms ± 1.0 ms → 466.0 ms ± 1.2 ms 1.04 ± 0.00 times faster hg status -mard mozilla-clean 317.3 ms ± 0.6 ms → 422.5 ms ± 1.2 ms 0.75 ± 0.00 times faster mozilla-dirty 315.4 ms ± 0.6 ms → 417.7 ms ± 1.1 ms 0.76 ± 0.00 times faster mozilla-ignored 314.6 ms ± 0.6 ms → 417.4 ms ± 1.0 ms 0.75 ± 0.00 times faster mozilla-unknowns 312.9 ms ± 0.9 ms → 417.3 ms ± 1.6 ms 0.75 ± 0.00 times faster netbeans-clean 212.0 ms ± 0.6 ms → 283.6 ms ± 0.8 ms 0.75 ± 0.00 times faster netbeans-dirty 211.4 ms ± 1.0 ms → 283.4 ms ± 1.6 ms 0.75 ± 0.01 times faster netbeans-ignored 211.4 ms ± 0.9 ms → 283.9 ms ± 0.8 ms 0.74 ± 0.01 times faster netbeans-unknowns 211.1 ms ± 0.6 ms → 283.4 ms ± 1.0 ms 0.74 ± 0.00 times faster Parsing mozilla-clean 38.4ms → 177.6ms mozilla-dirty 38.8ms → 177.0ms mozilla-ignored 38.8ms → 178.0ms mozilla-unknowns 38.7ms → 176.9ms netbeans-clean 16.5ms → 97.3ms netbeans-dirty 16.5ms → 98.4ms netbeans-ignored 16.9ms → 97.4ms netbeans-unknowns 16.9ms → 96.3ms Benchmark 2: Python + C dirstatemap → Rust tree DirstateMap hg status mozilla-clean 1261.0 ms ± 3.6 ms → 461.1 ms ± 0.5 ms 2.73 ± 0.00 times faster mozilla-dirty 2293.4 ms ± 9.1 ms → 719.6 ms ± 3.6 ms 3.19 ± 0.01 times faster mozilla-ignored 1240.4 ms ± 2.3 ms → 457.7 ms ± 1.9 ms 2.71 ± 0.00 times faster mozilla-unknowns 2283.3 ms ± 9.0 ms → 719.7 ms ± 3.8 ms 3.17 ± 0.01 times faster netbeans-clean 879.7 ms ± 3.5 ms → 339.9 ms ± 0.5 ms 2.59 ± 0.00 times faster netbeans-dirty 1257.3 ms ± 4.7 ms → 474.6 ms ± 1.6 ms 2.65 ± 0.01 times faster netbeans-ignored 943.9 ms ± 1.9 ms → 347.3 ms ± 1.1 ms 2.72 ± 0.00 times faster netbeans-unknowns 1188.1 ms ± 5.0 ms → 465.2 ms ± 2.3 ms 2.55 ± 0.01 times faster hg status -mard mozilla-clean 903.2 ms ± 3.6 ms → 423.4 ms ± 2.2 ms 2.13 ± 0.01 times faster mozilla-dirty 884.6 ms ± 4.5 ms → 417.3 ms ± 1.4 ms 2.12 ± 0.01 times faster mozilla-ignored 881.9 ms ± 1.3 ms → 417.3 ms ± 0.8 ms 2.11 ± 0.00 times faster mozilla-unknowns 878.5 ms ± 1.9 ms → 416.4 ms ± 0.9 ms 2.11 ± 0.00 times faster netbeans-clean 434.9 ms ± 1.8 ms → 284.0 ms ± 0.8 ms 1.53 ± 0.01 times faster netbeans-dirty 434.1 ms ± 0.8 ms → 283.1 ms ± 0.8 ms 1.53 ± 0.00 times faster netbeans-ignored 431.7 ms ± 1.1 ms → 283.6 ms ± 1.8 ms 1.52 ± 0.01 times faster netbeans-unknowns 433.0 ms ± 1.3 ms → 283.5 ms ± 0.7 ms 1.53 ± 0.00 times faster Differential Revision: https://phab.mercurial-scm.org/D11516

use bytes_cast::BytesCast;
use micro_timer::timed;
use std::borrow::Cow;
use std::convert::TryInto;
use std::path::PathBuf;

use super::on_disk;
use super::on_disk::DirstateV2ParseError;
use super::path_with_basename::WithBasename;
use crate::dirstate::parsers::pack_entry;
use crate::dirstate::parsers::packed_entry_size;
use crate::dirstate::parsers::parse_dirstate_entries;
use crate::dirstate::parsers::Timestamp;
use crate::dirstate::CopyMapIter;
use crate::dirstate::StateMapIter;
use crate::dirstate::SIZE_FROM_OTHER_PARENT;
use crate::dirstate::SIZE_NON_NORMAL;
use crate::matchers::Matcher;
use crate::utils::hg_path::{HgPath, HgPathBuf};
use crate::DirstateEntry;
use crate::DirstateError;
use crate::DirstateParents;
use crate::DirstateStatus;
use crate::EntryState;
use crate::FastHashMap;
use crate::PatternFileWarning;
use crate::StatusError;
use crate::StatusOptions;

/// Append to an existing data file if the amount of unreachable data (not used
/// anymore) is less than this fraction of the total amount of existing data.
const ACCEPTABLE_UNREACHABLE_BYTES_RATIO: f32 = 0.5;

pub struct DirstateMap<'on_disk> {
    /// Contents of the `.hg/dirstate` file
    pub(super) on_disk: &'on_disk [u8],

    pub(super) root: ChildNodes<'on_disk>,

    /// Number of nodes anywhere in the tree that have `.entry.is_some()`.
    pub(super) nodes_with_entry_count: u32,

    /// Number of nodes anywhere in the tree that have
    /// `.copy_source.is_some()`.
    pub(super) nodes_with_copy_source_count: u32,

    /// See on_disk::Header
    pub(super) ignore_patterns_hash: on_disk::IgnorePatternsHash,

    /// How many bytes of `on_disk` are not used anymore
    pub(super) unreachable_bytes: u32,
}

/// Using a plain `HgPathBuf` of the full path from the repository root as a
/// map key would also work: all paths in a given map have the same parent
/// path, so comparing full paths gives the same result as comparing base
/// names. However `HashMap` would waste time always re-hashing the same
/// string prefix.
pub(super) type NodeKey<'on_disk> = WithBasename<Cow<'on_disk, HgPath>>;

/// Similar to `&'tree Cow<'on_disk, HgPath>`, but can also be returned
/// for on-disk nodes that don’t actually have a `Cow` to borrow.
pub(super) enum BorrowedPath<'tree, 'on_disk> {
    InMemory(&'tree HgPathBuf),
    OnDisk(&'on_disk HgPath),
}

pub(super) enum ChildNodes<'on_disk> {
    InMemory(FastHashMap<NodeKey<'on_disk>, Node<'on_disk>>),
    OnDisk(&'on_disk [on_disk::Node]),
}

pub(super) enum ChildNodesRef<'tree, 'on_disk> {
    InMemory(&'tree FastHashMap<NodeKey<'on_disk>, Node<'on_disk>>),
    OnDisk(&'on_disk [on_disk::Node]),
}

pub(super) enum NodeRef<'tree, 'on_disk> {
    InMemory(&'tree NodeKey<'on_disk>, &'tree Node<'on_disk>),
    OnDisk(&'on_disk on_disk::Node),
}

impl<'tree, 'on_disk> BorrowedPath<'tree, 'on_disk> {
    pub fn detach_from_tree(&self) -> Cow<'on_disk, HgPath> {
        match *self {
            BorrowedPath::InMemory(in_memory) => Cow::Owned(in_memory.clone()),
            BorrowedPath::OnDisk(on_disk) => Cow::Borrowed(on_disk),
        }
    }
}

impl<'tree, 'on_disk> std::ops::Deref for BorrowedPath<'tree, 'on_disk> {
    type Target = HgPath;

    fn deref(&self) -> &HgPath {
        match *self {
            BorrowedPath::InMemory(in_memory) => in_memory,
            BorrowedPath::OnDisk(on_disk) => on_disk,
        }
    }
}

impl Default for ChildNodes<'_> {
    fn default() -> Self {
        ChildNodes::InMemory(Default::default())
    }
}

impl<'on_disk> ChildNodes<'on_disk> {
    pub(super) fn as_ref<'tree>(
        &'tree self,
    ) -> ChildNodesRef<'tree, 'on_disk> {
        match self {
            ChildNodes::InMemory(nodes) => ChildNodesRef::InMemory(nodes),
            ChildNodes::OnDisk(nodes) => ChildNodesRef::OnDisk(nodes),
        }
    }

    pub(super) fn is_empty(&self) -> bool {
        match self {
            ChildNodes::InMemory(nodes) => nodes.is_empty(),
            ChildNodes::OnDisk(nodes) => nodes.is_empty(),
        }
    }

    fn make_mut(
        &mut self,
        on_disk: &'on_disk [u8],
        unreachable_bytes: &mut u32,
    ) -> Result<
        &mut FastHashMap<NodeKey<'on_disk>, Node<'on_disk>>,
        DirstateV2ParseError,
    > {
        match self {
            ChildNodes::InMemory(nodes) => Ok(nodes),
            ChildNodes::OnDisk(nodes) => {
                *unreachable_bytes +=
                    std::mem::size_of_val::<[on_disk::Node]>(nodes) as u32;
                let nodes = nodes
                    .iter()
                    .map(|node| {
                        Ok((
                            node.path(on_disk)?,
                            node.to_in_memory_node(on_disk)?,
                        ))
                    })
                    .collect::<Result<_, _>>()?;
                *self = ChildNodes::InMemory(nodes);
                match self {
                    ChildNodes::InMemory(nodes) => Ok(nodes),
                    ChildNodes::OnDisk(_) => unreachable!(),
                }
            }
        }
    }
}

impl<'tree, 'on_disk> ChildNodesRef<'tree, 'on_disk> {
    pub(super) fn get(
        &self,
        base_name: &HgPath,
        on_disk: &'on_disk [u8],
    ) -> Result<Option<NodeRef<'tree, 'on_disk>>, DirstateV2ParseError> {
        match self {
            ChildNodesRef::InMemory(nodes) => Ok(nodes
                .get_key_value(base_name)
                .map(|(k, v)| NodeRef::InMemory(k, v))),
            ChildNodesRef::OnDisk(nodes) => {
                let mut parse_result = Ok(());
                let search_result = nodes.binary_search_by(|node| {
                    match node.base_name(on_disk) {
                        Ok(node_base_name) => node_base_name.cmp(base_name),
                        Err(e) => {
                            parse_result = Err(e);
                            // Dummy comparison result, `search_result` won’t
                            // be used since `parse_result` is an error
                            std::cmp::Ordering::Equal
                        }
                    }
                });
                parse_result.map(|()| {
                    search_result.ok().map(|i| NodeRef::OnDisk(&nodes[i]))
                })
            }
        }
    }

    /// Iterate in undefined order
    pub(super) fn iter(
        &self,
    ) -> impl Iterator<Item = NodeRef<'tree, 'on_disk>> {
        match self {
            ChildNodesRef::InMemory(nodes) => itertools::Either::Left(
                nodes.iter().map(|(k, v)| NodeRef::InMemory(k, v)),
            ),
            ChildNodesRef::OnDisk(nodes) => {
                itertools::Either::Right(nodes.iter().map(NodeRef::OnDisk))
            }
        }
    }

    /// Iterate in parallel in undefined order
    pub(super) fn par_iter(
        &self,
    ) -> impl rayon::iter::ParallelIterator<Item = NodeRef<'tree, 'on_disk>>
    {
        use rayon::prelude::*;
        match self {
            ChildNodesRef::InMemory(nodes) => rayon::iter::Either::Left(
                nodes.par_iter().map(|(k, v)| NodeRef::InMemory(k, v)),
            ),
            ChildNodesRef::OnDisk(nodes) => rayon::iter::Either::Right(
                nodes.par_iter().map(NodeRef::OnDisk),
            ),
        }
    }

    pub(super) fn sorted(&self) -> Vec<NodeRef<'tree, 'on_disk>> {
        match self {
            ChildNodesRef::InMemory(nodes) => {
                let mut vec: Vec<_> = nodes
                    .iter()
                    .map(|(k, v)| NodeRef::InMemory(k, v))
                    .collect();
                fn sort_key<'a>(node: &'a NodeRef) -> &'a HgPath {
                    match node {
                        NodeRef::InMemory(path, _node) => path.base_name(),
                        NodeRef::OnDisk(_) => unreachable!(),
                    }
                }
                // `sort_unstable_by_key` doesn’t allow keys borrowing from the
                // value: https://github.com/rust-lang/rust/issues/34162
                vec.sort_unstable_by(|a, b| sort_key(a).cmp(sort_key(b)));
                vec
            }
            ChildNodesRef::OnDisk(nodes) => {
                // Nodes on disk are already sorted
                nodes.iter().map(NodeRef::OnDisk).collect()
            }
        }
    }
}

impl<'tree, 'on_disk> NodeRef<'tree, 'on_disk> {
    pub(super) fn full_path(
        &self,
        on_disk: &'on_disk [u8],
    ) -> Result<&'tree HgPath, DirstateV2ParseError> {
        match self {
            NodeRef::InMemory(path, _node) => Ok(path.full_path()),
            NodeRef::OnDisk(node) => node.full_path(on_disk),
        }
    }

    /// Returns a `BorrowedPath`, which can be turned into a `Cow<'on_disk,
    /// HgPath>` detached from `'tree`
    pub(super) fn full_path_borrowed(
        &self,
        on_disk: &'on_disk [u8],
    ) -> Result<BorrowedPath<'tree, 'on_disk>, DirstateV2ParseError> {
        match self {
            NodeRef::InMemory(path, _node) => match path.full_path() {
                Cow::Borrowed(on_disk) => Ok(BorrowedPath::OnDisk(on_disk)),
                Cow::Owned(in_memory) => Ok(BorrowedPath::InMemory(in_memory)),
            },
            NodeRef::OnDisk(node) => {
                Ok(BorrowedPath::OnDisk(node.full_path(on_disk)?))
            }
        }
    }

    pub(super) fn base_name(
        &self,
        on_disk: &'on_disk [u8],
    ) -> Result<&'tree HgPath, DirstateV2ParseError> {
        match self {
            NodeRef::InMemory(path, _node) => Ok(path.base_name()),
            NodeRef::OnDisk(node) => node.base_name(on_disk),
        }
    }

    pub(super) fn children(
        &self,
        on_disk: &'on_disk [u8],
    ) -> Result<ChildNodesRef<'tree, 'on_disk>, DirstateV2ParseError> {
        match self {
            NodeRef::InMemory(_path, node) => Ok(node.children.as_ref()),
            NodeRef::OnDisk(node) => {
                Ok(ChildNodesRef::OnDisk(node.children(on_disk)?))
            }
        }
    }

    pub(super) fn has_copy_source(&self) -> bool {
        match self {
            NodeRef::InMemory(_path, node) => node.copy_source.is_some(),
            NodeRef::OnDisk(node) => node.has_copy_source(),
        }
    }

    pub(super) fn copy_source(
        &self,
        on_disk: &'on_disk [u8],
    ) -> Result<Option<&'tree HgPath>, DirstateV2ParseError> {
        match self {
            NodeRef::InMemory(_path, node) => {
                Ok(node.copy_source.as_ref().map(|s| &**s))
            }
            NodeRef::OnDisk(node) => node.copy_source(on_disk),
        }
    }

    pub(super) fn entry(
        &self,
    ) -> Result<Option<DirstateEntry>, DirstateV2ParseError> {
        match self {
            NodeRef::InMemory(_path, node) => {
                Ok(node.data.as_entry().copied())
            }
            NodeRef::OnDisk(node) => node.entry(),
        }
    }

    pub(super) fn state(
        &self,
    ) -> Result<Option<EntryState>, DirstateV2ParseError> {
        match self {
            NodeRef::InMemory(_path, node) => {
                Ok(node.data.as_entry().map(|entry| entry.state()))
            }
            NodeRef::OnDisk(node) => node.state(),
        }
    }

    pub(super) fn cached_directory_mtime(
        &self,
    ) -> Option<&'tree on_disk::Timestamp> {
        match self {
            NodeRef::InMemory(_path, node) => match &node.data {
                NodeData::CachedDirectory { mtime } => Some(mtime),
                _ => None,
            },
            NodeRef::OnDisk(node) => node.cached_directory_mtime(),
        }
    }

    pub(super) fn descendants_with_entry_count(&self) -> u32 {
        match self {
            NodeRef::InMemory(_path, node) => {
                node.descendants_with_entry_count
            }
            NodeRef::OnDisk(node) => node.descendants_with_entry_count.get(),
        }
    }

    pub(super) fn tracked_descendants_count(&self) -> u32 {
        match self {
            NodeRef::InMemory(_path, node) => node.tracked_descendants_count,
            NodeRef::OnDisk(node) => node.tracked_descendants_count.get(),
        }
    }
}

/// Represents a file or a directory
#[derive(Default)]
pub(super) struct Node<'on_disk> {
    pub(super) data: NodeData,

    pub(super) copy_source: Option<Cow<'on_disk, HgPath>>,

    pub(super) children: ChildNodes<'on_disk>,

    /// How many (non-inclusive) descendants of this node have an entry.
    pub(super) descendants_with_entry_count: u32,

    /// How many (non-inclusive) descendants of this node have an entry whose
    /// state is "tracked".
    pub(super) tracked_descendants_count: u32,
}

pub(super) enum NodeData {
    Entry(DirstateEntry),
    CachedDirectory { mtime: on_disk::Timestamp },
    None,
}

impl Default for NodeData {
    fn default() -> Self {
        NodeData::None
    }
}

impl NodeData {
    fn has_entry(&self) -> bool {
        match self {
            NodeData::Entry(_) => true,
            _ => false,
        }
    }

    fn as_entry(&self) -> Option<&DirstateEntry> {
        match self {
            NodeData::Entry(entry) => Some(entry),
            _ => None,
        }
    }
}

impl<'on_disk> DirstateMap<'on_disk> {
    pub(super) fn empty(on_disk: &'on_disk [u8]) -> Self {
        Self {
            on_disk,
            root: ChildNodes::default(),
            nodes_with_entry_count: 0,
            nodes_with_copy_source_count: 0,
            ignore_patterns_hash: [0; on_disk::IGNORE_PATTERNS_HASH_LEN],
            unreachable_bytes: 0,
        }
    }

    #[timed]
    pub fn new_v2(
        on_disk: &'on_disk [u8],
        data_size: usize,
        metadata: &[u8],
    ) -> Result<Self, DirstateError> {
        if let Some(data) = on_disk.get(..data_size) {
            Ok(on_disk::read(data, metadata)?)
        } else {
            Err(DirstateV2ParseError.into())
        }
    }

    #[timed]
    pub fn new_v1(
        on_disk: &'on_disk [u8],
    ) -> Result<(Self, Option<DirstateParents>), DirstateError> {
        let mut map = Self::empty(on_disk);
        if map.on_disk.is_empty() {
            return Ok((map, None));
        }

        let parents = parse_dirstate_entries(
            map.on_disk,
            |path, entry, copy_source| {
                let tracked = entry.state().is_tracked();
                let node = Self::get_or_insert_node(
                    map.on_disk,
                    &mut map.unreachable_bytes,
                    &mut map.root,
                    path,
                    WithBasename::to_cow_borrowed,
                    |ancestor| {
                        if tracked {
                            ancestor.tracked_descendants_count += 1
                        }
                        ancestor.descendants_with_entry_count += 1
                    },
                )?;
                assert!(
                    !node.data.has_entry(),
                    "duplicate dirstate entry in read"
                );
                assert!(
                    node.copy_source.is_none(),
                    "duplicate dirstate entry in read"
                );
                node.data = NodeData::Entry(*entry);
                node.copy_source = copy_source.map(Cow::Borrowed);
                map.nodes_with_entry_count += 1;
                if copy_source.is_some() {
                    map.nodes_with_copy_source_count += 1
                }
                Ok(())
            },
        )?;
        let parents = Some(parents.clone());

        Ok((map, parents))
    }

    /// Assuming dirstate-v2 format, returns whether the next write should
    /// append to the existing data file that contains `self.on_disk` (true),
    /// or create a new data file from scratch (false).
    pub(super) fn write_should_append(&self) -> bool {
        let ratio = self.unreachable_bytes as f32 / self.on_disk.len() as f32;
        ratio < ACCEPTABLE_UNREACHABLE_BYTES_RATIO
    }

    fn get_node<'tree>(
        &'tree self,
        path: &HgPath,
    ) -> Result<Option<NodeRef<'tree, 'on_disk>>, DirstateV2ParseError> {
        let mut children = self.root.as_ref();
        let mut components = path.components();
        let mut component =
            components.next().expect("expected at least one components");
        loop {
            if let Some(child) = children.get(component, self.on_disk)? {
                if let Some(next_component) = components.next() {
                    component = next_component;
                    children = child.children(self.on_disk)?;
                } else {
                    return Ok(Some(child));
                }
            } else {
                return Ok(None);
            }
        }
    }

    /// Returns a mutable reference to the node at `path` if it exists
    ///
    /// This takes `root` instead of `&mut self` so that callers can mutate
    /// other fields while the returned borrow is still valid
    fn get_node_mut<'tree>(
        on_disk: &'on_disk [u8],
        unreachable_bytes: &mut u32,
        root: &'tree mut ChildNodes<'on_disk>,
        path: &HgPath,
    ) -> Result<Option<&'tree mut Node<'on_disk>>, DirstateV2ParseError> {
        let mut children = root;
        let mut components = path.components();
        let mut component =
            components.next().expect("expected at least one components");
        loop {
            if let Some(child) = children
                .make_mut(on_disk, unreachable_bytes)?
                .get_mut(component)
            {
                if let Some(next_component) = components.next() {
                    component = next_component;
                    children = &mut child.children;
                } else {
                    return Ok(Some(child));
                }
            } else {
                return Ok(None);
            }
        }
    }

    pub(super) fn get_or_insert<'tree, 'path>(
        &'tree mut self,
        path: &HgPath,
    ) -> Result<&'tree mut Node<'on_disk>, DirstateV2ParseError> {
        Self::get_or_insert_node(
            self.on_disk,
            &mut self.unreachable_bytes,
            &mut self.root,
            path,
            WithBasename::to_cow_owned,
            |_| {},
        )
    }

    fn get_or_insert_node<'tree, 'path>(
        on_disk: &'on_disk [u8],
        unreachable_bytes: &mut u32,
        root: &'tree mut ChildNodes<'on_disk>,
        path: &'path HgPath,
        to_cow: impl Fn(
            WithBasename<&'path HgPath>,
        ) -> WithBasename<Cow<'on_disk, HgPath>>,
        mut each_ancestor: impl FnMut(&mut Node),
    ) -> Result<&'tree mut Node<'on_disk>, DirstateV2ParseError> {
        let mut child_nodes = root;
        let mut inclusive_ancestor_paths =
            WithBasename::inclusive_ancestors_of(path);
        let mut ancestor_path = inclusive_ancestor_paths
            .next()
            .expect("expected at least one inclusive ancestor");
        loop {
            // TODO: can we avoid allocating an owned key in cases where the
            // map already contains that key, without introducing double
            // lookup?
            let child_node = child_nodes
                .make_mut(on_disk, unreachable_bytes)?
                .entry(to_cow(ancestor_path))
                .or_default();
            if let Some(next) = inclusive_ancestor_paths.next() {
                each_ancestor(child_node);
                ancestor_path = next;
                child_nodes = &mut child_node.children;
            } else {
                return Ok(child_node);
            }
        }
    }

    fn add_or_remove_file(
        &mut self,
        path: &HgPath,
        old_state: Option<EntryState>,
        new_entry: DirstateEntry,
    ) -> Result<(), DirstateV2ParseError> {
        let had_entry = old_state.is_some();
        let was_tracked = old_state.map_or(false, |s| s.is_tracked());
        let tracked_count_increment =
            match (was_tracked, new_entry.state().is_tracked()) {
                (false, true) => 1,
                (true, false) => -1,
                _ => 0,
            };

        let node = Self::get_or_insert_node(
            self.on_disk,
            &mut self.unreachable_bytes,
            &mut self.root,
            path,
            WithBasename::to_cow_owned,
            |ancestor| {
                if !had_entry {
                    ancestor.descendants_with_entry_count += 1;
                }

                // We can’t use `+= increment` because the counter is unsigned,
                // and we want debug builds to detect accidental underflow
                // through zero
                match tracked_count_increment {
                    1 => ancestor.tracked_descendants_count += 1,
                    -1 => ancestor.tracked_descendants_count -= 1,
                    _ => {}
                }
            },
        )?;
        if !had_entry {
            self.nodes_with_entry_count += 1
        }
        node.data = NodeData::Entry(new_entry);
        Ok(())
    }

    fn iter_nodes<'tree>(
        &'tree self,
    ) -> impl Iterator<
        Item = Result<NodeRef<'tree, 'on_disk>, DirstateV2ParseError>,
    > + 'tree {
        // Depth first tree traversal.
        //
        // If we could afford internal iteration and recursion,
        // this would look like:
        //
        // ```
        // fn traverse_children(
        //     children: &ChildNodes,
        //     each: &mut impl FnMut(&Node),
        // ) {
        //     for child in children.values() {
        //         traverse_children(&child.children, each);
        //         each(child);
        //     }
        // }
        // ```
        //
        // However we want an external iterator and therefore can’t use the
        // call stack. Use an explicit stack instead:
        let mut stack = Vec::new();
        let mut iter = self.root.as_ref().iter();
        std::iter::from_fn(move || {
            while let Some(child_node) = iter.next() {
                let children = match child_node.children(self.on_disk) {
                    Ok(children) => children,
                    Err(error) => return Some(Err(error)),
                };
                // Pseudo-recursion
                let new_iter = children.iter();
                let old_iter = std::mem::replace(&mut iter, new_iter);
                stack.push((child_node, old_iter));
            }
            // Found the end of a `children.iter()` iterator.
            if let Some((child_node, next_iter)) = stack.pop() {
                // "Return" from pseudo-recursion by restoring state from the
                // explicit stack
                iter = next_iter;

                Some(Ok(child_node))
            } else {
                // Reached the bottom of the stack, we’re done
                None
            }
        })
    }

    fn clear_known_ambiguous_mtimes(
        &mut self,
        paths: &[impl AsRef<HgPath>],
    ) -> Result<(), DirstateV2ParseError> {
        for path in paths {
            if let Some(node) = Self::get_node_mut(
                self.on_disk,
                &mut self.unreachable_bytes,
                &mut self.root,
                path.as_ref(),
            )? {
                if let NodeData::Entry(entry) = &mut node.data {
                    entry.clear_mtime();
                }
            }
        }
        Ok(())
    }

    fn count_dropped_path(unreachable_bytes: &mut u32, path: &Cow<HgPath>) {
        if let Cow::Borrowed(path) = path {
            *unreachable_bytes += path.len() as u32
        }
    }
}

/// Like `Iterator::filter_map`, but over a fallible iterator of `Result`s.
///
/// The callback is only called for incoming `Ok` values. Errors are passed
/// through as-is. In order to let it use the `?` operator the callback is
/// expected to return a `Result` of `Option`, instead of an `Option` of
/// `Result`.
fn filter_map_results<'a, I, F, A, B, E>(
    iter: I,
    f: F,
) -> impl Iterator<Item = Result<B, E>> + 'a
where
    I: Iterator<Item = Result<A, E>> + 'a,
    F: Fn(A) -> Result<Option<B>, E> + 'a,
{
    iter.filter_map(move |result| match result {
        Ok(node) => f(node).transpose(),
        Err(e) => Some(Err(e)),
    })
}

impl<'on_disk> super::dispatch::DirstateMapMethods for DirstateMap<'on_disk> {
    fn clear(&mut self) {
        self.root = Default::default();
        self.nodes_with_entry_count = 0;
        self.nodes_with_copy_source_count = 0;
    }

    fn set_entry(
        &mut self,
        filename: &HgPath,
        entry: DirstateEntry,
    ) -> Result<(), DirstateV2ParseError> {
        self.get_or_insert(&filename)?.data = NodeData::Entry(entry);
        Ok(())
    }

    fn add_file(
        &mut self,
        filename: &HgPath,
        entry: DirstateEntry,
    ) -> Result<(), DirstateError> {
        let old_state = self.get(filename)?.map(|e| e.state());
        Ok(self.add_or_remove_file(filename, old_state, entry)?)
    }

    fn remove_file(
        &mut self,
        filename: &HgPath,
        in_merge: bool,
    ) -> Result<(), DirstateError> {
        let old_entry_opt = self.get(filename)?;
        let old_state = old_entry_opt.map(|e| e.state());
        let mut size = 0;
        if in_merge {
            // XXX we should not be able to have 'm' state and 'FROM_P2' if not
            // during a merge. So I (marmoute) am not sure we need the
            // conditionnal at all. Adding double checking this with assert
            // would be nice.
            if let Some(old_entry) = old_entry_opt {
                // backup the previous state
                if old_entry.state() == EntryState::Merged {
                    size = SIZE_NON_NORMAL;
                } else if old_entry.state() == EntryState::Normal
                    && old_entry.size() == SIZE_FROM_OTHER_PARENT
                {
                    // other parent
                    size = SIZE_FROM_OTHER_PARENT;
                }
            }
        }
        if size == 0 {
            self.copy_map_remove(filename)?;
        }
        let entry = DirstateEntry::new_removed(size);
        Ok(self.add_or_remove_file(filename, old_state, entry)?)
    }

    fn drop_entry_and_copy_source(
        &mut self,
        filename: &HgPath,
    ) -> Result<(), DirstateError> {
        let was_tracked = self
            .get(filename)?
            .map_or(false, |e| e.state().is_tracked());
        struct Dropped {
            was_tracked: bool,
            had_entry: bool,
            had_copy_source: bool,
        }

        /// If this returns `Ok(Some((dropped, removed)))`, then
        ///
        /// * `dropped` is about the leaf node that was at `filename`
        /// * `removed` is whether this particular level of recursion just
        ///   removed a node in `nodes`.
        fn recur<'on_disk>(
            on_disk: &'on_disk [u8],
            unreachable_bytes: &mut u32,
            nodes: &mut ChildNodes<'on_disk>,
            path: &HgPath,
        ) -> Result<Option<(Dropped, bool)>, DirstateV2ParseError> {
            let (first_path_component, rest_of_path) =
                path.split_first_component();
            let nodes = nodes.make_mut(on_disk, unreachable_bytes)?;
            let node = if let Some(node) = nodes.get_mut(first_path_component)
            {
                node
            } else {
                return Ok(None);
            };
            let dropped;
            if let Some(rest) = rest_of_path {
                if let Some((d, removed)) = recur(
                    on_disk,
                    unreachable_bytes,
                    &mut node.children,
                    rest,
                )? {
                    dropped = d;
                    if dropped.had_entry {
                        node.descendants_with_entry_count -= 1;
                    }
                    if dropped.was_tracked {
                        node.tracked_descendants_count -= 1;
                    }

                    // Directory caches must be invalidated when removing a
                    // child node
                    if removed {
                        if let NodeData::CachedDirectory { .. } = &node.data {
                            node.data = NodeData::None
                        }
                    }
                } else {
                    return Ok(None);
                }
            } else {
                let had_entry = node.data.has_entry();
                if had_entry {
                    node.data = NodeData::None
                }
                if let Some(source) = &node.copy_source {
                    DirstateMap::count_dropped_path(unreachable_bytes, source);
                    node.copy_source = None
                }
                dropped = Dropped {
                    was_tracked: node
                        .data
                        .as_entry()
                        .map_or(false, |entry| entry.state().is_tracked()),
                    had_entry,
                    had_copy_source: node.copy_source.take().is_some(),
                };
            }
            // After recursion, for both leaf (rest_of_path is None) nodes and
            // parent nodes, remove a node if it just became empty.
            let remove = !node.data.has_entry()
                && node.copy_source.is_none()
                && node.children.is_empty();
            if remove {
                let (key, _) =
                    nodes.remove_entry(first_path_component).unwrap();
                DirstateMap::count_dropped_path(
                    unreachable_bytes,
                    key.full_path(),
                )
            }
            Ok(Some((dropped, remove)))
        }

        if let Some((dropped, _removed)) = recur(
            self.on_disk,
            &mut self.unreachable_bytes,
            &mut self.root,
            filename,
        )? {
            if dropped.had_entry {
                self.nodes_with_entry_count -= 1
            }
            if dropped.had_copy_source {
                self.nodes_with_copy_source_count -= 1
            }
        } else {
            debug_assert!(!was_tracked);
        }
        Ok(())
    }

    fn has_tracked_dir(
        &mut self,
        directory: &HgPath,
    ) -> Result<bool, DirstateError> {
        if let Some(node) = self.get_node(directory)? {
            // A node without a `DirstateEntry` was created to hold child
            // nodes, and is therefore a directory.
            let state = node.state()?;
            Ok(state.is_none() && node.tracked_descendants_count() > 0)
        } else {
            Ok(false)
        }
    }

    fn has_dir(&mut self, directory: &HgPath) -> Result<bool, DirstateError> {
        if let Some(node) = self.get_node(directory)? {
            // A node without a `DirstateEntry` was created to hold child
            // nodes, and is therefore a directory.
            let state = node.state()?;
            Ok(state.is_none() && node.descendants_with_entry_count() > 0)
        } else {
            Ok(false)
        }
    }

    #[timed]
    fn pack_v1(
        &mut self,
        parents: DirstateParents,
        now: Timestamp,
    ) -> Result<Vec<u8>, DirstateError> {
        let now: i32 = now.0.try_into().expect("time overflow");
        let mut ambiguous_mtimes = Vec::new();
        // Optizimation (to be measured?): pre-compute size to avoid `Vec`
        // reallocations
        let mut size = parents.as_bytes().len();
        for node in self.iter_nodes() {
            let node = node?;
            if let Some(entry) = node.entry()? {
                size += packed_entry_size(
                    node.full_path(self.on_disk)?,
                    node.copy_source(self.on_disk)?,
                );
                if entry.mtime_is_ambiguous(now) {
                    ambiguous_mtimes.push(
                        node.full_path_borrowed(self.on_disk)?
                            .detach_from_tree(),
                    )
                }
            }
        }
        self.clear_known_ambiguous_mtimes(&ambiguous_mtimes)?;

        let mut packed = Vec::with_capacity(size);
        packed.extend(parents.as_bytes());

        for node in self.iter_nodes() {
            let node = node?;
            if let Some(entry) = node.entry()? {
                pack_entry(
                    node.full_path(self.on_disk)?,
                    &entry,
                    node.copy_source(self.on_disk)?,
                    &mut packed,
                );
            }
        }
        Ok(packed)
    }

    /// Returns new data and metadata together with whether that data should be
    /// appended to the existing data file whose content is at
    /// `self.on_disk` (true), instead of written to a new data file
    /// (false).
    #[timed]
    fn pack_v2(
        &mut self,
        now: Timestamp,
        can_append: bool,
    ) -> Result<(Vec<u8>, Vec<u8>, bool), DirstateError> {
        // TODO: how do we want to handle this in 2038?
        let now: i32 = now.0.try_into().expect("time overflow");
        let mut paths = Vec::new();
        for node in self.iter_nodes() {
            let node = node?;
            if let Some(entry) = node.entry()? {
                if entry.mtime_is_ambiguous(now) {
                    paths.push(
                        node.full_path_borrowed(self.on_disk)?
                            .detach_from_tree(),
                    )
                }
            }
        }
        // Borrow of `self` ends here since we collect cloned paths

        self.clear_known_ambiguous_mtimes(&paths)?;

        on_disk::write(self, can_append)
    }

    fn status<'a>(
        &'a mut self,
        matcher: &'a (dyn Matcher + Sync),
        root_dir: PathBuf,
        ignore_files: Vec<PathBuf>,
        options: StatusOptions,
    ) -> Result<(DirstateStatus<'a>, Vec<PatternFileWarning>), StatusError>
    {
        super::status::status(self, matcher, root_dir, ignore_files, options)
    }

    fn copy_map_len(&self) -> usize {
        self.nodes_with_copy_source_count as usize
    }

    fn copy_map_iter(&self) -> CopyMapIter<'_> {
        Box::new(filter_map_results(self.iter_nodes(), move |node| {
            Ok(if let Some(source) = node.copy_source(self.on_disk)? {
                Some((node.full_path(self.on_disk)?, source))
            } else {
                None
            })
        }))
    }

    fn copy_map_contains_key(
        &self,
        key: &HgPath,
    ) -> Result<bool, DirstateV2ParseError> {
        Ok(if let Some(node) = self.get_node(key)? {
            node.has_copy_source()
        } else {
            false
        })
    }

    fn copy_map_get(
        &self,
        key: &HgPath,
    ) -> Result<Option<&HgPath>, DirstateV2ParseError> {
        if let Some(node) = self.get_node(key)? {
            if let Some(source) = node.copy_source(self.on_disk)? {
                return Ok(Some(source));
            }
        }
        Ok(None)
    }

    fn copy_map_remove(
        &mut self,
        key: &HgPath,
    ) -> Result<Option<HgPathBuf>, DirstateV2ParseError> {
        let count = &mut self.nodes_with_copy_source_count;
        let unreachable_bytes = &mut self.unreachable_bytes;
        Ok(Self::get_node_mut(
            self.on_disk,
            unreachable_bytes,
            &mut self.root,
            key,
        )?
        .and_then(|node| {
            if let Some(source) = &node.copy_source {
                *count -= 1;
                Self::count_dropped_path(unreachable_bytes, source);
            }
            node.copy_source.take().map(Cow::into_owned)
        }))
    }

    fn copy_map_insert(
        &mut self,
        key: HgPathBuf,
        value: HgPathBuf,
    ) -> Result<Option<HgPathBuf>, DirstateV2ParseError> {
        let node = Self::get_or_insert_node(
            self.on_disk,
            &mut self.unreachable_bytes,
            &mut self.root,
            &key,
            WithBasename::to_cow_owned,
            |_ancestor| {},
        )?;
        if node.copy_source.is_none() {
            self.nodes_with_copy_source_count += 1
        }
        Ok(node.copy_source.replace(value.into()).map(Cow::into_owned))
    }

    fn len(&self) -> usize {
        self.nodes_with_entry_count as usize
    }

    fn contains_key(
        &self,
        key: &HgPath,
    ) -> Result<bool, DirstateV2ParseError> {
        Ok(self.get(key)?.is_some())
    }

    fn get(
        &self,
        key: &HgPath,
    ) -> Result<Option<DirstateEntry>, DirstateV2ParseError> {
        Ok(if let Some(node) = self.get_node(key)? {
            node.entry()?
        } else {
            None
        })
    }

    fn iter(&self) -> StateMapIter<'_> {
        Box::new(filter_map_results(self.iter_nodes(), move |node| {
            Ok(if let Some(entry) = node.entry()? {
                Some((node.full_path(self.on_disk)?, entry))
            } else {
                None
            })
        }))
    }

    fn iter_tracked_dirs(
        &mut self,
    ) -> Result<
        Box<
            dyn Iterator<Item = Result<&HgPath, DirstateV2ParseError>>
                + Send
                + '_,
        >,
        DirstateError,
    > {
        let on_disk = self.on_disk;
        Ok(Box::new(filter_map_results(
            self.iter_nodes(),
            move |node| {
                Ok(if node.tracked_descendants_count() > 0 {
                    Some(node.full_path(on_disk)?)
                } else {
                    None
                })
            },
        )))
    }

    fn debug_iter(
        &self,
        all: bool,
    ) -> Box<
        dyn Iterator<
                Item = Result<
                    (&HgPath, (u8, i32, i32, i32)),
                    DirstateV2ParseError,
                >,
            > + Send
            + '_,
    > {
        Box::new(filter_map_results(self.iter_nodes(), move |node| {
            let debug_tuple = if let Some(entry) = node.entry()? {
                entry.debug_tuple()
            } else if !all {
                return Ok(None);
            } else if let Some(mtime) = node.cached_directory_mtime() {
                (b' ', 0, -1, mtime.seconds() as i32)
            } else {
                (b' ', 0, -1, -1)
            };
            Ok(Some((node.full_path(self.on_disk)?, debug_tuple)))
        }))
    }
}