mercurial/help/internals/changegroups.txt
changeset 43632 2e017696181f
parent 43631 d3c4368099ed
child 43633 0b7733719d21
--- a/mercurial/help/internals/changegroups.txt	Thu Nov 14 11:33:05 2019 +0100
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,207 +0,0 @@
-Changegroups are representations of repository revlog data, specifically
-the changelog data, root/flat manifest data, treemanifest data, and
-filelogs.
-
-There are 3 versions of changegroups: ``1``, ``2``, and ``3``. From a
-high-level, versions ``1`` and ``2`` are almost exactly the same, with the
-only difference being an additional item in the *delta header*. Version
-``3`` adds support for storage flags in the *delta header* and optionally
-exchanging treemanifests (enabled by setting an option on the
-``changegroup`` part in the bundle2).
-
-Changegroups when not exchanging treemanifests consist of 3 logical
-segments::
-
-   +---------------------------------+
-   |           |          |          |
-   | changeset | manifest | filelogs |
-   |           |          |          |
-   |           |          |          |
-   +---------------------------------+
-
-When exchanging treemanifests, there are 4 logical segments::
-
-   +-------------------------------------------------+
-   |           |          |               |          |
-   | changeset |   root   | treemanifests | filelogs |
-   |           | manifest |               |          |
-   |           |          |               |          |
-   +-------------------------------------------------+
-
-The principle building block of each segment is a *chunk*. A *chunk*
-is a framed piece of data::
-
-   +---------------------------------------+
-   |           |                           |
-   |  length   |           data            |
-   | (4 bytes) |   (<length - 4> bytes)    |
-   |           |                           |
-   +---------------------------------------+
-
-All integers are big-endian signed integers. Each chunk starts with a 32-bit
-integer indicating the length of the entire chunk (including the length field
-itself).
-
-There is a special case chunk that has a value of 0 for the length
-(``0x00000000``). We call this an *empty chunk*.
-
-Delta Groups
-============
-
-A *delta group* expresses the content of a revlog as a series of deltas,
-or patches against previous revisions.
-
-Delta groups consist of 0 or more *chunks* followed by the *empty chunk*
-to signal the end of the delta group::
-
-  +------------------------------------------------------------------------+
-  |                |             |               |             |           |
-  | chunk0 length  | chunk0 data | chunk1 length | chunk1 data |    0x0    |
-  |   (4 bytes)    |  (various)  |   (4 bytes)   |  (various)  | (4 bytes) |
-  |                |             |               |             |           |
-  +------------------------------------------------------------------------+
-
-Each *chunk*'s data consists of the following::
-
-  +---------------------------------------+
-  |                        |              |
-  |     delta header       |  delta data  |
-  |  (various by version)  |  (various)   |
-  |                        |              |
-  +---------------------------------------+
-
-The *delta data* is a series of *delta*s that describe a diff from an existing
-entry (either that the recipient already has, or previously specified in the
-bundle/changegroup).
-
-The *delta header* is different between versions ``1``, ``2``, and
-``3`` of the changegroup format.
-
-Version 1 (headerlen=80)::
-
-   +------------------------------------------------------+
-   |            |             |             |             |
-   |    node    |   p1 node   |   p2 node   |  link node  |
-   | (20 bytes) |  (20 bytes) |  (20 bytes) |  (20 bytes) |
-   |            |             |             |             |
-   +------------------------------------------------------+
-
-Version 2 (headerlen=100)::
-
-   +------------------------------------------------------------------+
-   |            |             |             |            |            |
-   |    node    |   p1 node   |   p2 node   | base node  | link node  |
-   | (20 bytes) |  (20 bytes) |  (20 bytes) | (20 bytes) | (20 bytes) |
-   |            |             |             |            |            |
-   +------------------------------------------------------------------+
-
-Version 3 (headerlen=102)::
-
-   +------------------------------------------------------------------------------+
-   |            |             |             |            |            |           |
-   |    node    |   p1 node   |   p2 node   | base node  | link node  |   flags   |
-   | (20 bytes) |  (20 bytes) |  (20 bytes) | (20 bytes) | (20 bytes) | (2 bytes) |
-   |            |             |             |            |            |           |
-   +------------------------------------------------------------------------------+
-
-The *delta data* consists of ``chunklen - 4 - headerlen`` bytes, which contain a
-series of *delta*s, densely packed (no separators). These deltas describe a diff
-from an existing entry (either that the recipient already has, or previously
-specified in the bundle/changegroup). The format is described more fully in
-``hg help internals.bdiff``, but briefly::
-
-   +---------------------------------------------------------------+
-   |              |            |            |                      |
-   | start offset | end offset | new length |        content       |
-   |  (4 bytes)   |  (4 bytes) |  (4 bytes) | (<new length> bytes) |
-   |              |            |            |                      |
-   +---------------------------------------------------------------+
-
-Please note that the length field in the delta data does *not* include itself.
-
-In version 1, the delta is always applied against the previous node from
-the changegroup or the first parent if this is the first entry in the
-changegroup.
-
-In version 2 and up, the delta base node is encoded in the entry in the
-changegroup. This allows the delta to be expressed against any parent,
-which can result in smaller deltas and more efficient encoding of data.
-
-The *flags* field holds bitwise flags affecting the processing of revision
-data. The following flags are defined:
-
-32768
-   Censored revision. The revision's fulltext has been replaced by censor
-   metadata. May only occur on file revisions.
-16384
-   Ellipsis revision. Revision hash does not match data (likely due to rewritten
-   parents).
-8192
-   Externally stored. The revision fulltext contains ``key:value`` ``\n``
-   delimited metadata defining an object stored elsewhere. Used by the LFS
-   extension.
-
-For historical reasons, the integer values are identical to revlog version 1
-per-revision storage flags and correspond to bits being set in this 2-byte
-field. Bits were allocated starting from the most-significant bit, hence the
-reverse ordering and allocation of these flags.
-
-Changeset Segment
-=================
-
-The *changeset segment* consists of a single *delta group* holding
-changelog data. The *empty chunk* at the end of the *delta group* denotes
-the boundary to the *manifest segment*.
-
-Manifest Segment
-================
-
-The *manifest segment* consists of a single *delta group* holding manifest
-data. If treemanifests are in use, it contains only the manifest for the
-root directory of the repository. Otherwise, it contains the entire
-manifest data. The *empty chunk* at the end of the *delta group* denotes
-the boundary to the next segment (either the *treemanifests segment* or the
-*filelogs segment*, depending on version and the request options).
-
-Treemanifests Segment
----------------------
-
-The *treemanifests segment* only exists in changegroup version ``3``, and
-only if the 'treemanifest' param is part of the bundle2 changegroup part
-(it is not possible to use changegroup version 3 outside of bundle2).
-Aside from the filenames in the *treemanifests segment* containing a
-trailing ``/`` character, it behaves identically to the *filelogs segment*
-(see below). The final sub-segment is followed by an *empty chunk* (logically,
-a sub-segment with filename size 0). This denotes the boundary to the
-*filelogs segment*.
-
-Filelogs Segment
-================
-
-The *filelogs segment* consists of multiple sub-segments, each
-corresponding to an individual file whose data is being described::
-
-   +--------------------------------------------------+
-   |          |          |          |     |           |
-   | filelog0 | filelog1 | filelog2 | ... |    0x0    |
-   |          |          |          |     | (4 bytes) |
-   |          |          |          |     |           |
-   +--------------------------------------------------+
-
-The final filelog sub-segment is followed by an *empty chunk* (logically,
-a sub-segment with filename size 0). This denotes the end of the segment
-and of the overall changegroup.
-
-Each filelog sub-segment consists of the following::
-
-   +------------------------------------------------------+
-   |                 |                      |             |
-   | filename length |       filename       | delta group |
-   |    (4 bytes)    | (<length - 4> bytes) |  (various)  |
-   |                 |                      |             |
-   +------------------------------------------------------+
-
-That is, a *chunk* consisting of the filename (not terminated or padded)
-followed by N chunks constituting the *delta group* for this file. The
-*empty chunk* at the end of each *delta group* denotes the boundary to the
-next filelog sub-segment.