largefiles: don't verify largefile hashes on servers when processing statlfile stable
authorMads Kiilerich <madski@unity3d.com>
Mon, 28 Jan 2013 15:19:44 +0100
branchstable
changeset 18488 a977b42df8b3
parent 18487 7aacc114d4f8
child 18489 f1700480bef7
largefiles: don't verify largefile hashes on servers when processing statlfile When changesets referencing largefiles are pushed then the corresponding largefiles will be pushed too - unless the target already has them. The client will use statlfile to make sure it only sends largefiles that the target doesn't have. The server would however on every statlfile check that the content of the largefile had the expected hash. What should be cheap thus became an expensive operation that trashed the disk and the cache. Largefile hashes are already checked by putlfile before being stored on the server. A server should thus be able to keep its largefile store free of errors - even more than it can keep revlogs free of errors. Verification should happen when running 'hg verify' locally on the server. Rehashing every largefile on every remote stat is too expensive. Clients will also stat lfiles before downloading them. When the server verified the hash in stat it meant that it had to read the file twice to serve it. With this change the server will assume its own hashes are ok without checking them on every statlfile. Some consequences of this change: - in case of server side corruption the problem will be detected by the existing check on the client side - not on server side - clients that could upload an uncorrupted largefile when pushing will no longer magically heal the server (and break hardlinks) - a client will now only upload its uncorrupted files after the corrupted file has been removed on the server side - client side verify will no longer report corruption in files it doesn't have (Issue3123 discussed related problems - and how they have been fixed.)
hgext/largefiles/proto.py
tests/test-largefiles.t
--- a/hgext/largefiles/proto.py	Mon Jan 28 15:19:44 2013 +0100
+++ b/hgext/largefiles/proto.py	Mon Jan 28 15:19:44 2013 +0100
@@ -63,18 +63,16 @@
     return wireproto.streamres(generator())
 
 def statlfile(repo, proto, sha):
-    '''Return '2\n' if the largefile is missing, '1\n' if it has a
-    mismatched checksum, or '0\n' if it is in good condition'''
+    '''Return '2\n' if the largefile is missing, '0\n' if it seems to be in
+    good condition.
+
+    The value 1 is reserved for mismatched checksum, but that is too expensive
+    to be verified on every stat and must be caught be running 'hg verify'
+    server side.'''
     filename = lfutil.findfile(repo, sha)
     if not filename:
         return '2\n'
-    fd = None
-    try:
-        fd = open(filename, 'rb')
-        return lfutil.hexsha1(fd) == sha and '0\n' or '1\n'
-    finally:
-        if fd:
-            fd.close()
+    return '0\n'
 
 def wirereposetup(ui, repo):
     class lfileswirerepository(repo.__class__):
--- a/tests/test-largefiles.t	Mon Jan 28 15:19:44 2013 +0100
+++ b/tests/test-largefiles.t	Mon Jan 28 15:19:44 2013 +0100
@@ -1593,7 +1593,7 @@
   abort: remotestore: could not put $TESTTMP/r7/.hg/largefiles/4cdac4d8b084d0b599525cf732437fb337d422a8 to remote store http://localhost:$HGPORT1/ (glob)
   [255]
   $ mv 4cdac4d8b084d0b599525cf732437fb337d422a8 r7/.hg/largefiles/4cdac4d8b084d0b599525cf732437fb337d422a8
-Push of file that exists on server but is corrupted - magic healing is nice ... but too magic
+Push of file that exists on server but is corrupted - magic healing would be nice ... but too magic
   $ echo "server side corruption" > empty/.hg/largefiles/4cdac4d8b084d0b599525cf732437fb337d422a8
   $ hg push -R r7 http://localhost:$HGPORT1
   pushing to http://localhost:$HGPORT1/
@@ -1604,7 +1604,7 @@
   remote: adding file changes
   remote: added 2 changesets with 2 changes to 2 files
   $ cat empty/.hg/largefiles/4cdac4d8b084d0b599525cf732437fb337d422a8
-  c2
+  server side corruption
   $ rm -rf empty
 
 Push a largefiles repository to a served empty repository
@@ -1670,8 +1670,9 @@
   $ echo corruption > empty/.hg/largefiles/02a439e5c31c526465ab1a0ca1f431f76b827b90
   $ hg -R http-clone up --config largefiles.usercache=http-clone-usercache
   getting changed largefiles
-  abort: remotestore: largefile 02a439e5c31c526465ab1a0ca1f431f76b827b90 is invalid
-  [255]
+  f1: data corruption (expected 02a439e5c31c526465ab1a0ca1f431f76b827b90, got 6a7bb2556144babe3899b25e5428123735bb1e27)
+  0 largefiles updated, 0 removed
+  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
   $ hg -R http-clone st
   ! f1
   $ [ ! -f http-clone/.hg/largefiles/02a439e5c31c526465ab1a0ca1f431f76b827b90 ]
@@ -1684,9 +1685,7 @@
   checking files
   1 files, 1 changesets, 1 total revisions
   searching 1 changesets for largefiles
-  changeset 0:cf03e5bb9936: f1: contents differ
   verified contents of 1 revisions of 1 largefiles
-  [1]
   $ hg -R http-clone up -Cqr null
 
 largefiles pulled on update - no server side problems: