bdiff: adjust criteria for getting optimal longest match in the A side middle
authorMads Kiilerich <madski@unity3d.com>
Tue, 08 Nov 2016 18:37:33 +0100
changeset 30429 38ed54888617
parent 30428 3743e5dbb824
child 30430 5c4e2636c1a9
bdiff: adjust criteria for getting optimal longest match in the A side middle We prefer matches closer to the middle to balance recursion, as introduced in f1ca249696ed. For ranges with uneven length, matches starting exactly in the middle should have preference. That will be optimal for matches of length 1. We will thus accept equality in the half check. For ranges with even length, half was ceil'ed when calculated but we got the preference for low matches from the 'less than half' check. To get the same result as before when we also accept equality, floor it. Without that, test-annotate.t would show some different (still correct but less optimal) results. This will change the heuristics. Tests shows a slightly different output - and sometimes slightly smaller bundles. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22804885 to 22803824 bytes - an 0.005% reduction.
mercurial/bdiff.c
tests/test-bdiff.py
tests/test-bdiff.py.out
--- a/mercurial/bdiff.c	Tue Nov 08 18:37:33 2016 +0100
+++ b/mercurial/bdiff.c	Tue Nov 08 18:37:33 2016 +0100
@@ -151,7 +151,7 @@
 	if (a2 - a1 > 30000)
 		a1 = a2 - 30000;
 
-	half = (a1 + a2) / 2;
+	half = (a1 + a2 - 1) / 2;
 
 	for (i = a1; i < a2; i++) {
 		/* skip all lines in b after the current block */
@@ -177,7 +177,7 @@
 
 			/* best match so far? we prefer matches closer
 			   to the middle to balance recursion */
-			if (k > mk || (k == mk && (i <= mi || i < half))) {
+			if (k > mk || (k == mk && (i <= mi || i <= half))) {
 				mi = i;
 				mj = j;
 				mk = k;
--- a/tests/test-bdiff.py	Tue Nov 08 18:37:33 2016 +0100
+++ b/tests/test-bdiff.py	Tue Nov 08 18:37:33 2016 +0100
@@ -88,7 +88,7 @@
 showdiff('a\n', 'a\n' * 3)
 print("Diff 1 to 5 lines - preference for adding / removing at the end of sequences:")
 showdiff('a\n', 'a\n' * 5)
-print("Diff 3 to 1 lines - preference for adding / removing at the end of sequences:")
+print("Diff 3 to 1 lines - preference for balanced recursion:")
 showdiff('a\n' * 3, 'a\n')
-print("Diff 5 to 1 lines - this diff seems weird:")
+print("Diff 5 to 1 lines - preference for balanced recursion:")
 showdiff('a\n' * 5, 'a\n')
--- a/tests/test-bdiff.py.out	Tue Nov 08 18:37:33 2016 +0100
+++ b/tests/test-bdiff.py.out	Tue Nov 08 18:37:33 2016 +0100
@@ -67,16 +67,17 @@
   'a\na\na\na\na\n'):
  'a\n'
  2 2 '' -> 'a\na\na\na\n'
-Diff 3 to 1 lines - preference for adding / removing at the end of sequences:
+Diff 3 to 1 lines - preference for balanced recursion:
 showdiff(
   'a\na\na\n',
   'a\n'):
+ 0 2 'a\n' -> ''
  'a\n'
- 2 6 'a\na\n' -> ''
-Diff 5 to 1 lines - this diff seems weird:
+ 4 6 'a\n' -> ''
+Diff 5 to 1 lines - preference for balanced recursion:
 showdiff(
   'a\na\na\na\na\n',
   'a\n'):
- 0 2 'a\n' -> ''
+ 0 4 'a\na\n' -> ''
  'a\n'
- 4 10 'a\na\na\n' -> ''
+ 6 10 'a\na\n' -> ''