i18n/posplit
author Pierre-Yves David <pierre-yves.david@octobus.net>
Tue, 09 Apr 2024 22:36:35 +0200
changeset 51594 e3a5ec2d236a
parent 48875 6000f5b25c9b
permissions -rwxr-xr-x
outgoing: rework the handling of the `missingroots` case to be faster The previous implementation was slow, to the point it was taking a significant amount of `hg bundle --type none-streamv2` call. We rework the code to compute the same value much faster, making the operation disappear from the `hg bundle --type none-streamv2` profile. Someone would remark that producing a streamclone does not requires an `outgoing` object. However that is a matter for another day. There is other user of `missingroots` (non stream `hg bundle` call for example), and they will also benefit from this rework. We implement an old TODO in the process, directly computing the missing and common attribute as we have most element at hand already. ### benchmark.name = hg.command.bundle # bin-env-vars.hg.flavor = default # bin-env-vars.hg.py-re2-module = default # benchmark.variants.revs = all # benchmark.variants.type = none-streamv2 ## data-env-vars.name = heptapod-public-2024-03-25-zstd-sparse-revlog before: 7.750458 after: 6.665565 (-14.00%, -1.08) ## data-env-vars.name = mercurial-public-2024-03-22-zstd-sparse-revlog before: 0.700229 after: 0.496050 (-29.16%, -0.20) ## data-env-vars.name = mozilla-try-2023-03-22-zstd-sparse-revlog before: 346.508952 after: 316.749699 (-8.59%, -29.76) ## data-env-vars.name = pypy-2024-03-22-zstd-sparse-revlog before: 3.401700 after: 2.915810 (-14.28%, -0.49) ## data-env-vars.name = tryton-public-2024-03-22-zstd-sparse-revlog before: 1.870798 after: 1.461583 (-21.87%, -0.41) note: this whole `missingroots` of outgoing has a limited number of callers and could likely be replace by something simpler (like taking an explicit "missing_revs" set for example). However this is a wider change and we focus on a small impact, quick rework that does not change the API for now.

#!/usr/bin/env python3
#
# posplit - split messages in paragraphs on .po/.pot files
#
# license: MIT/X11/Expat
#


import polib
import re
import sys


def addentry(po, entry, cache):
    e = cache.get(entry.msgid)
    if e:
        e.occurrences.extend(entry.occurrences)

        # merge comments from entry
        for comment in entry.comment.split('\n'):
            if comment and comment not in e.comment:
                if not e.comment:
                    e.comment = comment
                else:
                    e.comment += '\n' + comment
    else:
        po.append(entry)
        cache[entry.msgid] = entry


def mkentry(orig, delta, msgid, msgstr):
    entry = polib.POEntry()
    entry.merge(orig)
    entry.msgid = msgid or orig.msgid
    entry.msgstr = msgstr or orig.msgstr
    entry.occurrences = [(p, int(l) + delta) for (p, l) in orig.occurrences]
    return entry


if __name__ == "__main__":
    po = polib.pofile(sys.argv[1])

    cache = {}
    entries = po[:]
    po[:] = []
    findd = re.compile(r' *\.\. (\w+)::')  # for finding directives
    for entry in entries:
        msgids = entry.msgid.split(u'\n\n')
        if entry.msgstr:
            msgstrs = entry.msgstr.split(u'\n\n')
        else:
            msgstrs = [u''] * len(msgids)

        if len(msgids) != len(msgstrs):
            # places the whole existing translation as a fuzzy
            # translation for each paragraph, to give the
            # translator a chance to recover part of the old
            # translation - erasing extra paragraphs is
            # probably better than retranslating all from start
            if 'fuzzy' not in entry.flags:
                entry.flags.append('fuzzy')
            msgstrs = [entry.msgstr] * len(msgids)

        delta = 0
        for msgid, msgstr in zip(msgids, msgstrs):
            if msgid and msgid != '::':
                newentry = mkentry(entry, delta, msgid, msgstr)
                mdirective = findd.match(msgid)
                if mdirective:
                    if not msgid[mdirective.end() :].rstrip():
                        # only directive, nothing to translate here
                        delta += 2
                        continue
                    directive = mdirective.group(1)
                    if directive in ('container', 'include'):
                        if msgid.rstrip('\n').count('\n') == 0:
                            # only rst syntax, nothing to translate
                            delta += 2
                            continue
                        else:
                            # lines following directly, unexpected
                            print(
                                'Warning: text follows line with directive'
                                ' %s' % directive
                            )
                    comment = 'do not translate: .. %s::' % directive
                    if not newentry.comment:
                        newentry.comment = comment
                    elif comment not in newentry.comment:
                        newentry.comment += '\n' + comment
                addentry(po, newentry, cache)
            delta += 2 + msgid.count('\n')
    po.save()