notes.txt
author mpm@selenic.com
Wed, 18 May 2005 16:29:39 -0800
changeset 94 7daef883134f
parent 59 2bff7c0ea1d3
child 260 d7ce76d82876
permissions -rw-r--r--
Refactor merge code Delete old code Fix calculation of newer nodes on server Fix branch recursion on client Fix manifest merge problems Add more debugging and note messages to merge
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     1
Some notes about Mercurial's design
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     2
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     3
Revlogs:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     4
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     5
The fundamental storage type in Mercurial is a "revlog". A revlog is
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     6
the set of all revisions to a file. Each revision is either stored
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     7
compressed in its entirety or as a compressed binary delta against the
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     8
previous version. The decision of when to store a full version is made
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     9
based on how much data would be needed to reconstruct the file. This
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    10
lets us ensure that we never need to read huge amounts of data to
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    11
reconstruct a file, regardless of how many revisions of it we store.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    12
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    13
In fact, we should always be able to do it with a single read,
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    14
provided we know when and where to read. This is where the index comes
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    15
in. Each revlog has an index containing a special hash (nodeid) of the
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    16
text, hashes for its parents, and where and how much of the revlog
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    17
data we need to read to reconstruct it. Thus, with one read of the
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    18
index and one read of the data, we can reconstruct any version in time
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    19
proportional to the file size.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    20
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    21
Similarly, revlogs and their indices are append-only. This means that
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    22
adding a new version is also O(1) seeks.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    23
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    24
Generally revlogs are used to represent revisions of files, but they
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    25
also are used to represent manifests and changesets.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    26
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    27
Manifests:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    28
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    29
A manifest is simply a list of all files in a given revision of a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    30
project along with the nodeids of the corresponding file revisions. So
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    31
grabbing a given version of the project means simply looking up its
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    32
manifest and reconstruction all the file revisions pointed to by it.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    33
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    34
Changesets:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    35
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    36
A changeset is a list of all files changed in a check-in along with a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    37
change description and some metadata like user and date. It also
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    38
contains a nodeid to the relevent revision of the manifest. Changesets
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    39
and manifests are one-to-one, but contain different data for
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    40
convenience.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    41
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    42
Nodeids:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    43
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    44
Nodeids are unique ids that are used to represent the contents of a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    45
file AND its position in the project history. That is, if you change a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    46
file and then change it back, the result will have a different nodeid
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    47
because it has different history. This is accomplished by including
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    48
the parents of a given revision's nodeids with the revision's text
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    49
when calculating the hash.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    50
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    51
Graph merging:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    52
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    53
Nodeids are implemented as they are to simplify merging. Merging a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    54
pair of directed acyclic graphs (aka "the family tree" of the file
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    55
history) requires some method of determining if nodes in different
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    56
graphs correspond. Simply comparing the contents of the node (by
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    57
comparing text of given revisions or their hashes) can get confused by
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    58
identical revisions in the tree.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    59
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    60
The nodeid approach makes it trivial - the hash uniquely describes a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    61
revision's contents and its graph position relative to the root, so
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    62
merge is simply checking whether each nodeid in graph A is in the hash
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    63
table of graph B. If not, we pull them in, adding them sequentially to
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    64
the revlog.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    65
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    66
Graph resolving:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    67
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    68
Mercurial does branching by copying (or COWing) a repository and thus
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    69
keeps everything nice and linear within a repository. However, when a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    70
merge of repositories (a "pull") is done, we may often have two head
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    71
revisions in a given graph. To keep things simple, Mercurial forces
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    72
the head revisions to be merged.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    73
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    74
It first finds the closest common ancestor of the two heads. If one is
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    75
a child of the other, it becomes the new head. Otherwise, we call out
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    76
to a user-specified 3-way merge tool.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    77
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    78
Merging files, manifests, and changesets:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    79
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    80
We begin by comparing changeset DAGs, pulling all nodes we don't have
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    81
in our DAG from the other repository. As we do so, we collect a list
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    82
of changed files to merge.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    83
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    84
Then for each file, we perform a graph merge and resolve as above.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    85
It's important to merge files using per-file DAGs rather than just
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    86
changeset level DAGs as this diagram illustrates: 
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    87
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    88
M   M1   M2
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    89
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    90
AB
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    91
 |`-------v     M2 clones M
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    92
aB       AB     file A is change in mainline
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    93
 |`---v  AB'    file B is changed in M2
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    94
 |   aB / |     M1 clones M
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    95
 |   ab/  |     M1 changes B
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    96
 |   ab'  |     M1 merges from M2, changes to B conflict
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    97
 |    |  A'B'   M2 changes A
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    98
  `---+--.|
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    99
      |  a'B'   M2 merges from mainline, changes to A conflict
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   100
      `--.|
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   101
         ???    depending on which ancestor we choose, we will have
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   102
	        to redo A hand-merge, B hand-merge, or both
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   103
                but if we look at the files independently, everything
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   104
		is fine
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   105
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   106
After we've merged files, we merge the manifest log DAG and resolve
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   107
additions and deletions. Then we are ready to resolve the changeset
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   108
DAG - if our merge required any changes (the new head is not a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   109
decendent of our tip), we must create a new changeset describing all
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   110
of the changes needed to merge it into the tip.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   111
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   112
Merge performance:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   113
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   114
The I/O operations for performing a merge are O(changed files), not
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   115
O(total changes) and in many cases, we needn't even unpack the deltas
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   116
to add them to our repository (though this optimization isn't
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   117
necessary).
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   118
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   119
Rollback:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   120
47
45cc818f2445 Rollback is implemented
mpm@selenic.com
parents: 0
diff changeset
   121
When performing a commit or a merge, we order things so that the
45cc818f2445 Rollback is implemented
mpm@selenic.com
parents: 0
diff changeset
   122
changeset entry gets added last. We keep a transaction log of the name
45cc818f2445 Rollback is implemented
mpm@selenic.com
parents: 0
diff changeset
   123
of each file touched and its length prior to the transaction. On
45cc818f2445 Rollback is implemented
mpm@selenic.com
parents: 0
diff changeset
   124
abort, we simply truncate each file to its prior length. This is one
45cc818f2445 Rollback is implemented
mpm@selenic.com
parents: 0
diff changeset
   125
of the nice properties of the append-only structure of the revlogs.
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   126
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   127
Remote access:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   128
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   129
Mercurial currently supports pulling from "serverless" repositories.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   130
Simply making the repo directory accessibly via the web and pointing
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   131
hg at it can accomplish a pull. This is relatively bandwidth efficient
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   132
but no effort has been spent on pipelining, so it won't work
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   133
especially well over LAN yet.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   134
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   135
It's also quite amenable to rsync, if you don't mind keeping an intact
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   136
copy of the master around locally.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   137
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   138
Also note the append-only and ordering properties of the commit
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   139
guarantee that readers will always see a repository in a consistent
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   140
state and no special locking is necessary. As there is generally only
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   141
one writer to an hg repository, there is in fact no exclusion
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   142
implemented yet. 
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   143
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   144
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   145
Some comparisons to git:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   146
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   147
Most notably, Mercurial uses delta compression and repositories
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   148
created with it will grow much more slowly over time. This also allows
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   149
it to be much more bandwidth efficient. I expect repos sizes and sync
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   150
speeds to be similar to or better than BK, given the use of binary diffs.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   151
59
2bff7c0ea1d3 Minor corrections
mpm@selenic.com
parents: 47
diff changeset
   152
Mercurial is roughly the same performance as git in some areas and is
2bff7c0ea1d3 Minor corrections
mpm@selenic.com
parents: 47
diff changeset
   153
faster in others as it keeps around more metadata. One example is
2bff7c0ea1d3 Minor corrections
mpm@selenic.com
parents: 47
diff changeset
   154
listing and retrieving past versions of a file, which it can do
2bff7c0ea1d3 Minor corrections
mpm@selenic.com
parents: 47
diff changeset
   155
without reading all the changesets. This metadata will also allow it
2bff7c0ea1d3 Minor corrections
mpm@selenic.com
parents: 47
diff changeset
   156
to perform better merges as described above.
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   157
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
   158