wireprotoserver: move all wire protocol handling logic out of hgweb
authorGregory Szorc <gregory.szorc@gmail.com>
Thu, 08 Mar 2018 15:58:52 -0800
changeset 36812 158d4ecc03c8
parent 36811 cfb9ef24968c
child 36813 5a3c83412f79
wireprotoserver: move all wire protocol handling logic out of hgweb Previous patches from several days ago worked to isolate processing of HTTP wire protocol requests to wireprotoserver. We still had a little logic in hgweb. If feels like the right time to finish the job. This commit moves WSGI request servicing from hgweb to wireprotoserver. The ugly dict holding the parsed request is no more. I think the new code is cleaner. As part of this, we now process wire protocol requests before the block to obtain the "query" variable. This makes it clear that this wonky "query" variable is not used by the wire protocol. The wonkiest part about this code is the HTTP 404. I'm actually not sure what all is going on here. It looks like the code is trying to prevent URL with path components that specify a command from not working. That part I grok. What I don't grok is why we need to send a 404. I would think it would be OK to no-op and let another handler try to service the request. But if we do this, we get some subrepo test failures. So it looks like something is expecting the HTTP 404 and reacting to it in a specific way. It /might/ be possible to change the behavior here. But it isn't something I'm comfortable doing because I don't understand the problem space. Differential Revision: https://phab.mercurial-scm.org/D2740
mercurial/hgweb/hgweb_mod.py
mercurial/wireprotoserver.py
--- a/mercurial/hgweb/hgweb_mod.py	Thu Mar 08 15:37:05 2018 -0800
+++ b/mercurial/hgweb/hgweb_mod.py	Thu Mar 08 15:58:52 2018 -0800
@@ -318,25 +318,16 @@
                                if h[0] != 'Content-Security-Policy']
             wsgireq.headers.append(('Content-Security-Policy', rctx.csp))
 
+        handled, res = wireprotoserver.handlewsgirequest(
+            rctx, wsgireq, req, self.check_perm)
+        if handled:
+            return res
+
         if req.havepathinfo:
             query = req.dispatchpath
         else:
             query = req.querystring.partition('&')[0].partition(';')[0]
 
-        # Route it to a wire protocol handler if it looks like a wire protocol
-        # request.
-        protohandler = wireprotoserver.parsehttprequest(rctx, wsgireq, req,
-                                                        self.check_perm)
-
-        if protohandler:
-            try:
-                if query:
-                    raise ErrorResponse(HTTP_NOT_FOUND)
-
-                return protohandler['dispatch']()
-            except ErrorResponse as inst:
-                return protohandler['handleerror'](inst)
-
         # translate user-visible url structure to internal structure
 
         args = query.split('/', 2)
--- a/mercurial/wireprotoserver.py	Thu Mar 08 15:37:05 2018 -0800
+++ b/mercurial/wireprotoserver.py	Thu Mar 08 15:58:52 2018 -0800
@@ -150,24 +150,29 @@
 def iscmd(cmd):
     return cmd in wireproto.commands
 
-def parsehttprequest(rctx, wsgireq, req, checkperm):
-    """Parse the HTTP request for a wire protocol request.
+def handlewsgirequest(rctx, wsgireq, req, checkperm):
+    """Possibly process a wire protocol request.
 
-    If the current request appears to be a wire protocol request, this
-    function returns a dict with details about that request, including
-    an ``abstractprotocolserver`` instance suitable for handling the
-    request. Otherwise, ``None`` is returned.
+    If the current request is a wire protocol request, the request is
+    processed by this function.
 
     ``wsgireq`` is a ``wsgirequest`` instance.
     ``req`` is a ``parsedrequest`` instance.
+
+    Returns a 2-tuple of (bool, response) where the 1st element indicates
+    whether the request was handled and the 2nd element is a return
+    value for a WSGI application (often a generator of bytes).
     """
+    # Avoid cycle involving hg module.
+    from .hgweb import common as hgwebcommon
+
     repo = rctx.repo
 
     # HTTP version 1 wire protocol requests are denoted by a "cmd" query
     # string parameter. If it isn't present, this isn't a wire protocol
     # request.
     if 'cmd' not in req.querystringdict:
-        return None
+        return False, None
 
     cmd = req.querystringdict['cmd'][0]
 
@@ -179,17 +184,32 @@
     # known wire protocol commands and it is less confusing for machine
     # clients.
     if not iscmd(cmd):
-        return None
+        return False, None
+
+    # The "cmd" query string argument is only valid on the root path of the
+    # repo. e.g. ``/?cmd=foo``, ``/repo?cmd=foo``. URL paths within the repo
+    # like ``/blah?cmd=foo`` are not allowed. So don't recognize the request
+    # in this case. We send an HTTP 404 for backwards compatibility reasons.
+    if req.dispatchpath:
+        res = _handlehttperror(
+            hgwebcommon.ErrorResponse(hgwebcommon.HTTP_NOT_FOUND), wsgireq,
+            cmd)
+
+        return True, res
 
     proto = httpv1protocolhandler(wsgireq, repo.ui,
                                   lambda perm: checkperm(rctx, wsgireq, perm))
 
-    return {
-        'cmd': cmd,
-        'proto': proto,
-        'dispatch': lambda: _callhttp(repo, wsgireq, proto, cmd),
-        'handleerror': lambda ex: _handlehttperror(ex, wsgireq, cmd),
-    }
+    # The permissions checker should be the only thing that can raise an
+    # ErrorResponse. It is kind of a layer violation to catch an hgweb
+    # exception here. So consider refactoring into a exception type that
+    # is associated with the wire protocol.
+    try:
+        res = _callhttp(repo, wsgireq, proto, cmd)
+    except hgwebcommon.ErrorResponse as e:
+        res = _handlehttperror(e, wsgireq, cmd)
+
+    return True, res
 
 def _httpresponsetype(ui, wsgireq, prefer_uncompressed):
     """Determine the appropriate response type and compression settings.