cext: fix memory leak in phases computation
Without this a buffer whose size in bytes is the number of
changesets in the repository is leaked each time the repository is
opened and changeset phases are computed.
Impact: the current code in hgwebdir creates a new `localrepository`
instance for each HTTP request. Since any pull or push is made of several
requests, a team of 100 people can easily produce thousands of such
requests per day.
Being a low-level malloc, this leak can't be seen with the gc module and
tools relying on that, but was spotted by valgrind immediately.
Reproduction
------------
for i in range(cl_args.iterations):
repo = hg.repository(baseui, repo_path)
rev = repo.revs(rev).first()
ctx = repo[rev]
del ctx
del repo
# avoid any pollution by other type of leak
# (that should be fixed in 5.8)
repoview._filteredrepotypes.clear()
gc.collect()
Measurements
------------
Resident Set Size (RSS), taken on a clone of
mozilla-central for performance analysis (440 000
changesets).
before:
5.8+hg19.5ac0f2a8ba72 1000 iterations: 1606MB
5.8+hg19.5ac0f2a8ba72 10000 iterations: 5723MB
after:
5.8+hg20.e2084d39e145 1000 iterations: 555MB
5.8+hg20.e2084d39e145 10000 iterations: 555MB
(double checked, not a copy/paste error)
(e2084d39e14 is the present changeset, before amendment
of the message to add the measurements)
#!/usr/bin/env python3
# Filters traceback lines from stdin.
from __future__ import absolute_import, print_function
import io
import sys
if sys.version_info[0] >= 3:
# Prevent \r from being inserted on Windows.
sys.stdout = io.TextIOWrapper(
sys.stdout.buffer,
sys.stdout.encoding,
sys.stdout.errors,
newline="\n",
line_buffering=sys.stdout.line_buffering,
)
state = 'none'
for line in sys.stdin:
if state == 'none':
if line.startswith('Traceback '):
state = 'tb'
elif state == 'tb':
if line.startswith(' File '):
state = 'file'
continue
elif not line.startswith(' '):
state = 'none'
elif state == 'file':
# Ignore lines after " File "
state = 'tb'
continue
print(line, end='')