mercurial/help/internals/wireprotocolv2.txt
author Gregory Szorc <gregory.szorc@gmail.com>
Wed, 12 Sep 2018 10:01:16 -0700
changeset 39630 9c2c77c73f23
parent 39461 7df9ae38c75c
child 39632 c1aacb0d76ff
permissions -rw-r--r--
wireprotov2: define and implement "changesetdata" command This commit introduces the "changesetdata" wire protocol command. The role of the command is to expose data associated with changelog revisions, including the raw revision data itself. This command is the first piece of a new clone/pull strategy that is built on top of domain-specific commands for data retrieval. Instead of a monolithic "getbundle" command that transfers all of the things, we'll be introducing commands for fetching specific pieces of data. Since the changeset is the fundamental unit from which we derive pointers to other data (manifests, file nodes, etc), it makes sense to start reimplementing pull with this data. The command accepts as arguments a set of root and head revisions defining the changesets that should be fetched as well as an explicit list of nodes. By default, the command returns only the node values: the client must explicitly request additional fields be added to the response. Current supported fields are the list of parent nodes and the revision fulltext. My plan is to eventually add support for transferring other data associated with changesets, including phases, bookmarks, obsolescence markers, etc. Since the response format is CBOR, we'll be able to add this data into the response object relatively easily (it should be as simple as adding a key in a map). The documentation captures a number of TODO items. Some of these may require BC breaking changes. That's fine: wire protocol v2 is still highly experimental. Differential Revision: https://phab.mercurial-scm.org/D4481

**Experimental and under active development**

This section documents the wire protocol commands exposed to transports
using the frame-based protocol. The set of commands exposed through
these transports is distinct from the set of commands exposed to legacy
transports.

The frame-based protocol uses CBOR to encode command execution requests.
All command arguments must be mapped to a specific or set of CBOR data
types.

The response to many commands is also CBOR. There is no common response
format: each command defines its own response format.

TODOs
=====

* Add "node namespace" support to each command. In order to support
  SHA-1 hash transition, we want servers to be able to expose different
  "node namespaces" for the same data. Every command operating on nodes
  should specify which "node namespace" it is operating on and responses
  should encode the "node namespace" accordingly.

Commands
========

The sections below detail all commands available to wire protocol version
2.

branchmap
---------

Obtain heads in named branches.

Receives no arguments.

The response is a map with bytestring keys defining the branch name.
Values are arrays of bytestring defining raw changeset nodes.

capabilities
------------

Obtain the server's capabilities.

Receives no arguments.

This command is typically called only as part of the handshake during
initial connection establishment.

The response is a map with bytestring keys defining server information.

The defined keys are:

commands
   A map defining available wire protocol commands on this server.

   Keys in the map are the names of commands that can be invoked. Values
   are maps defining information about that command. The bytestring keys
   are:

      args
         A map of argument names and their expected types.

         Types are defined as a representative value for the expected type.
         e.g. an argument expecting a boolean type will have its value
         set to true. An integer type will have its value set to 42. The
         actual values are arbitrary and may not have meaning.
      permissions
         An array of permissions required to execute this command.

compression
   An array of maps defining available compression format support.

   The array is sorted from most preferred to least preferred.

   Each entry has the following bytestring keys:

      name
         Name of the compression engine. e.g. ``zstd`` or ``zlib``.

framingmediatypes
   An array of bytestrings defining the supported framing protocol
   media types. Servers will not accept media types not in this list.

rawrepoformats
   An array of storage formats the repository is using. This set of
   requirements can be used to determine whether a client can read a
   *raw* copy of file data available.

changesetdata
-------------

Obtain various data related to changesets.

The command accepts the following arguments:

noderange
   (array of arrays of bytestrings) An array of 2 elements, each being an
   array of node bytestrings. The first array denotes the changelog revisions
   that are already known to the client. The second array denotes the changelog
   revision DAG heads to fetch. The argument essentially defines a DAG range
   bounded by root and head nodes to fetch.

   The roots array may be empty. The heads array must be defined.

nodes
   (array of bytestrings) Changelog revisions to request explicitly.

fields
   (set of bytestring) Which data associated with changelog revisions to
   fetch. The following values are recognized:

   parents
      Parent revisions.

   revision
      The raw, revision data for the changelog entry. The hash of this data
      will match the revision's node value.

The server resolves the set of revisions relevant to the request by taking
the union of the ``noderange`` and ``nodes`` arguments. At least one of these
arguments must be defined.

The response bytestream starts with a CBOR map describing the data that follows.
This map has the following bytestring keys:

totalitems
   (unsigned integer) Total number of changelog revisions whose data is being
   transferred.

Following the map header is a series of 0 or more CBOR values. If values
are present, the first value will always be a map describing a single changeset
revision. If revision data is requested, the raw revision data (encoded as
a CBOR bytestring) will follow the map describing it. Otherwise, another CBOR
map describing the next changeset revision will occur.

Each map has the following bytestring keys:

node
   (bytestring) The node value for this revision. This is the SHA-1 hash of
   the raw revision data.

parents (optional)
   (array of bytestrings) The nodes representing the parent revisions of this
   revision. Only present if ``parents`` data is being requested.

revisionsize (optional)
   (unsigned integer) Indicates the size of raw revision data that follows this
   map. The following data contains a serialized form of the changeset data,
   including the author, date, commit message, set of changed files, manifest
   node, and other metadata.

   Only present if ``revision`` data was requested and the data follows this
   map.

If nodes are requested via ``noderange``, they will be emitted in DAG order,
parents always before children.

If nodes are requested via ``nodes``, they will be emitted in requested order.

Nodes from ``nodes`` are emitted before nodes from ``noderange``.

TODO support different revision selection mechanisms (e.g. non-public, specific
revisions)
TODO support different hash "namespaces" for revisions (e.g. sha-1 versus other)
TODO support emitting phases data
TODO support emitting bookmarks data
TODO support emitting obsolescence data
TODO support filtering based on relevant paths (narrow clone)
TODO support depth limiting
TODO support hgtagsfnodes cache / tags data
TODO support branch heads cache

heads
-----

Obtain DAG heads in the repository.

The command accepts the following arguments:

publiconly (optional)
   (boolean) If set, operate on the DAG for public phase changesets only.
   Non-public (i.e. draft) phase DAG heads will not be returned.

The response is a CBOR array of bytestrings defining changeset nodes
of DAG heads. The array can be empty if the repository is empty or no
changesets satisfied the request.

TODO consider exposing phase of heads in response

known
-----

Determine whether a series of changeset nodes is known to the server.

The command accepts the following arguments:

nodes
   (array of bytestrings) List of changeset nodes whose presence to
   query.

The response is a bytestring where each byte contains a 0 or 1 for the
corresponding requested node at the same index.

TODO use a bit array for even more compact response

listkeys
--------

List values in a specified ``pushkey`` namespace.

The command receives the following arguments:

namespace
   (bytestring) Pushkey namespace to query.

The response is a map with bytestring keys and values.

TODO consider using binary to represent nodes in certain pushkey namespaces.

lookup
------

Try to resolve a value to a changeset revision.

Unlike ``known`` which operates on changeset nodes, lookup operates on
node fragments and other names that a user may use.

The command receives the following arguments:

key
   (bytestring) Value to try to resolve.

On success, returns a bytestring containing the resolved node.

pushkey
-------

Set a value using the ``pushkey`` protocol.

The command receives the following arguments:

namespace
   (bytestring) Pushkey namespace to operate on.
key
   (bytestring) The pushkey key to set.
old
   (bytestring) Old value for this key.
new
   (bytestring) New value for this key.

TODO consider using binary to represent nodes is certain pushkey namespaces.
TODO better define response type and meaning.