wireproto: define permissions-based routing of HTTPv2 wire protocol
authorGregory Szorc <gregory.szorc@gmail.com>
Mon, 19 Mar 2018 16:43:47 -0700
changeset 37047 fddcb51b5084
parent 37046 1cfef5693203
child 37048 fc5e261915b9
wireproto: define permissions-based routing of HTTPv2 wire protocol Now that we have a scaffolding for serving version 2 of the HTTP protocol, let's start implementing it. A good place to start is URL routing and basic request processing semantics. We can focus on content types, capabilities detect, etc later. Version 2 of the HTTP wire protocol encodes the needed permissions of the request in the URL path. The reasons for this are documented in the added documentation. In short, a) it makes it really easy and fail proof for server administrators to implement path-based authentication and b) it will enable clients to realize very early in a server exchange that authentication will be required to complete the operation. This latter point avoids all kinds of complexity and problems, like dealing with Expect: 100-continue and clients finding out later during `hg push` that they need to provide authentication. This will avoid the current badness where clients send a full bundle, get an HTTP 403, provide authentication, then retransmit the bundle. In order to implement command checking, we needed to implement a protocol handler for the new wire protocol. Our handler is just small enough to run the code we've implemented. Tests for the defined functionality have been added. I very much want to refactor the permissions checking code and define a better response format. But this can be done later. Nothing is covered by backwards compatibility at this point. Differential Revision: https://phab.mercurial-scm.org/D2836
mercurial/debugcommands.py
mercurial/help/internals/wireprotocol.txt
mercurial/wireprotoserver.py
tests/test-http-api-httpv2.t
--- a/mercurial/debugcommands.py	Tue Mar 13 16:53:21 2018 -0700
+++ b/mercurial/debugcommands.py	Mon Mar 19 16:43:47 2018 -0700
@@ -2970,6 +2970,11 @@
             url = path + httppath
             req = urlmod.urlreq.request(pycompat.strurl(url), body, headers)
 
+            # urllib.Request insists on using has_data() as a proxy for
+            # determining the request method. Override that to use our
+            # explicitly requested method.
+            req.get_method = lambda: method
+
             try:
                 opener.open(req).read()
             except util.urlerr.urlerror as e:
--- a/mercurial/help/internals/wireprotocol.txt	Tue Mar 13 16:53:21 2018 -0700
+++ b/mercurial/help/internals/wireprotocol.txt	Mon Mar 19 16:43:47 2018 -0700
@@ -144,6 +144,46 @@
 ``application/mercurial-0.*`` media type and the HTTP response is typically
 using *chunked transfer* (``Transfer-Encoding: chunked``).
 
+HTTP Version 2 Transport
+------------------------
+
+**Experimental - feature under active development**
+
+Version 2 of the HTTP protocol is exposed under the ``/api/*`` URL space.
+It's final API name is not yet formalized.
+
+Commands are triggered by sending HTTP requests against URLs of the
+form ``<permission>/<command>``, where ``<permission>`` is ``ro`` or
+``rw``, meaning read-only and read-write, respectively and ``<command>``
+is a named wire protocol command.
+
+Commands that modify repository state in meaningful ways MUST NOT be
+exposed under the ``ro`` URL prefix. All available commands MUST be
+available under the ``rw`` URL prefix.
+
+Server adminstrators MAY implement blanket HTTP authentication keyed
+off the URL prefix. For example, a server may require authentication
+for all ``rw/*`` URLs and let unauthenticated requests to ``ro/*``
+URL proceed. A server MAY issue an HTTP 401, 403, or 407 response
+in accordance with RFC 7235. Clients SHOULD recognize the HTTP Basic
+(RFC 7617) and Digest (RFC 7616) authentication schemes. Clients SHOULD
+make an attempt to recognize unknown schemes using the
+``WWW-Authenticate`` response header on a 401 response, as defined by
+RFC 7235.
+
+Read-only commands are accessible under ``rw/*`` URLs so clients can
+signal the intent of the operation very early in the connection
+lifecycle. For example, a ``push`` operation - which consists of
+various read-only commands mixed with at least one read-write command -
+can perform all commands against ``rw/*`` URLs so that any server-side
+authentication requirements are discovered upon attempting the first
+command - not potentially several commands into the exchange. This
+allows clients to fail faster or prompt for credentials as soon as the
+exchange takes place. This provides a better end-user experience.
+
+Requests to unknown commands or URLS result in an HTTP 404.
+TODO formally define response type, how error is communicated, etc.
+
 SSH Protocol
 ============
 
--- a/mercurial/wireprotoserver.py	Tue Mar 13 16:53:21 2018 -0700
+++ b/mercurial/wireprotoserver.py	Mon Mar 19 16:43:47 2018 -0700
@@ -272,6 +272,64 @@
                                    req.dispatchparts[2:])
 
 def _handlehttpv2request(rctx, req, res, checkperm, urlparts):
+    from .hgweb import common as hgwebcommon
+
+    # URL space looks like: <permissions>/<command>, where <permission> can
+    # be ``ro`` or ``rw`` to signal read-only or read-write, respectively.
+
+    # Root URL does nothing meaningful... yet.
+    if not urlparts:
+        res.status = b'200 OK'
+        res.headers[b'Content-Type'] = b'text/plain'
+        res.setbodybytes(_('HTTP version 2 API handler'))
+        return
+
+    if len(urlparts) == 1:
+        res.status = b'404 Not Found'
+        res.headers[b'Content-Type'] = b'text/plain'
+        res.setbodybytes(_('do not know how to process %s\n') %
+                         req.dispatchpath)
+        return
+
+    permission, command = urlparts[0:2]
+
+    if permission not in (b'ro', b'rw'):
+        res.status = b'404 Not Found'
+        res.headers[b'Content-Type'] = b'text/plain'
+        res.setbodybytes(_('unknown permission: %s') % permission)
+        return
+
+    # At some point we'll want to use our own API instead of recycling the
+    # behavior of version 1 of the wire protocol...
+    # TODO return reasonable responses - not responses that overload the
+    # HTTP status line message for error reporting.
+    try:
+        checkperm(rctx, req, 'pull' if permission == b'ro' else 'push')
+    except hgwebcommon.ErrorResponse as e:
+        res.status = hgwebcommon.statusmessage(e.code, pycompat.bytestr(e))
+        for k, v in e.headers:
+            res.headers[k] = v
+        res.setbodybytes('permission denied')
+        return
+
+    if command not in wireproto.commands:
+        res.status = b'404 Not Found'
+        res.headers[b'Content-Type'] = b'text/plain'
+        res.setbodybytes(_('unknown wire protocol command: %s\n') % command)
+        return
+
+    repo = rctx.repo
+    ui = repo.ui
+
+    proto = httpv2protocolhandler(req, ui)
+
+    if not wireproto.commands.commandavailable(command, proto):
+        res.status = b'404 Not Found'
+        res.headers[b'Content-Type'] = b'text/plain'
+        res.setbodybytes(_('invalid wire protocol command: %s') % command)
+        return
+
+    # We don't do anything meaningful yet.
     res.status = b'200 OK'
     res.headers[b'Content-Type'] = b'text/plain'
     res.setbodybytes(b'/'.join(urlparts) + b'\n')
@@ -284,6 +342,34 @@
     },
 }
 
+class httpv2protocolhandler(wireprototypes.baseprotocolhandler):
+    def __init__(self, req, ui):
+        self._req = req
+        self._ui = ui
+
+    @property
+    def name(self):
+        return HTTPV2
+
+    def getargs(self, args):
+        raise NotImplementedError
+
+    def forwardpayload(self, fp):
+        raise NotImplementedError
+
+    @contextlib.contextmanager
+    def mayberedirectstdio(self):
+        raise NotImplementedError
+
+    def client(self):
+        raise NotImplementedError
+
+    def addcapabilities(self, repo, caps):
+        raise NotImplementedError
+
+    def checkperm(self, perm):
+        raise NotImplementedError
+
 def _httpresponsetype(ui, req, prefer_uncompressed):
     """Determine the appropriate response type and compression settings.
 
--- a/tests/test-http-api-httpv2.t	Tue Mar 13 16:53:21 2018 -0700
+++ b/tests/test-http-api-httpv2.t	Mon Mar 19 16:43:47 2018 -0700
@@ -1,7 +1,24 @@
+  $ HTTPV2=exp-http-v2-0001
+
   $ send() {
   >   hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT/
   > }
 
+  $ cat > dummycommands.py << EOF
+  > from mercurial import wireprototypes, wireproto
+  > @wireproto.wireprotocommand('customreadonly', permission='pull')
+  > def customreadonly(repo, proto):
+  >     return wireprototypes.bytesresponse(b'customreadonly bytes response')
+  > @wireproto.wireprotocommand('customreadwrite', permission='push')
+  > def customreadwrite(repo, proto):
+  >     return wireprototypes.bytesresponse(b'customreadwrite bytes response')
+  > EOF
+
+  $ cat >> $HGRCPATH << EOF
+  > [extensions]
+  > dummycommands = $TESTTMP/dummycommands.py
+  > EOF
+
   $ hg init server
   $ cat > server/.hg/hgrc << EOF
   > [experimental]
@@ -13,7 +30,7 @@
 HTTP v2 protocol not enabled by default
 
   $ send << EOF
-  > httprequest GET api/exp-http-v2-0001
+  > httprequest GET api/$HTTPV2
   >     user-agent: test
   > EOF
   using raw connection to peer
@@ -43,14 +60,14 @@
   $ hg -R server serve -p $HGPORT -d --pid-file hg.pid
   $ cat hg.pid > $DAEMON_PIDS
 
-Requests simply echo their path (for now)
+Request to read-only command works out of the box
 
   $ send << EOF
-  > httprequest GET api/exp-http-v2-0001/path1/path2
+  > httprequest GET api/$HTTPV2/ro/customreadonly
   >     user-agent: test
   > EOF
   using raw connection to peer
-  s>     GET /api/exp-http-v2-0001/path1/path2 HTTP/1.1\r\n
+  s>     GET /api/exp-http-v2-0001/ro/customreadonly HTTP/1.1\r\n
   s>     Accept-Encoding: identity\r\n
   s>     user-agent: test\r\n
   s>     host: $LOCALIP:$HGPORT\r\n (glob)
@@ -60,6 +77,178 @@
   s>     Server: testing stub value\r\n
   s>     Date: $HTTP_DATE$\r\n
   s>     Content-Type: text/plain\r\n
-  s>     Content-Length: 12\r\n
+  s>     Content-Length: 18\r\n
+  s>     \r\n
+  s>     ro/customreadonly\n
+
+Request to unknown command yields 404
+
+  $ send << EOF
+  > httprequest GET api/$HTTPV2/ro/badcommand
+  >     user-agent: test
+  > EOF
+  using raw connection to peer
+  s>     GET /api/exp-http-v2-0001/ro/badcommand HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 404 Not Found\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: text/plain\r\n
+  s>     Content-Length: 42\r\n
+  s>     \r\n
+  s>     unknown wire protocol command: badcommand\n
+
+Request to read-write command fails because server is read-only by default
+
+GET to read-write request not allowed
+
+  $ send << EOF
+  > httprequest GET api/$HTTPV2/rw/customreadonly
+  >     user-agent: test
+  > EOF
+  using raw connection to peer
+  s>     GET /api/exp-http-v2-0001/rw/customreadonly HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 405 push requires POST request\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Length: 17\r\n
+  s>     \r\n
+  s>     permission denied
+
+Even for unknown commands
+
+  $ send << EOF
+  > httprequest GET api/$HTTPV2/rw/badcommand
+  >     user-agent: test
+  > EOF
+  using raw connection to peer
+  s>     GET /api/exp-http-v2-0001/rw/badcommand HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 405 push requires POST request\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Length: 17\r\n
+  s>     \r\n
+  s>     permission denied
+
+SSL required by default
+
+  $ send << EOF
+  > httprequest POST api/$HTTPV2/rw/customreadonly
+  >     user-agent: test
+  > EOF
+  using raw connection to peer
+  s>     POST /api/exp-http-v2-0001/rw/customreadonly HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 403 ssl required\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Length: 17\r\n
   s>     \r\n
-  s>     path1/path2\n
+  s>     permission denied
+
+Restart server to allow non-ssl read-write operations
+
+  $ killdaemons.py
+  $ cat > server/.hg/hgrc << EOF
+  > [experimental]
+  > web.apiserver = true
+  > web.api.http-v2 = true
+  > [web]
+  > push_ssl = false
+  > EOF
+
+  $ hg -R server serve -p $HGPORT -d --pid-file hg.pid
+  $ cat hg.pid > $DAEMON_PIDS
+
+Server insists on POST for read-write commands
+
+  $ send << EOF
+  > httprequest GET api/$HTTPV2/rw/customreadonly
+  >     user-agent: test
+  > EOF
+  using raw connection to peer
+  s>     GET /api/exp-http-v2-0001/rw/customreadonly HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 405 push requires POST request\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Length: 17\r\n
+  s>     \r\n
+  s>     permission denied
+
+  $ killdaemons.py
+  $ cat > server/.hg/hgrc << EOF
+  > [experimental]
+  > web.apiserver = true
+  > web.api.http-v2 = true
+  > [web]
+  > push_ssl = false
+  > allow-push = *
+  > EOF
+
+  $ hg -R server serve -p $HGPORT -d --pid-file hg.pid
+  $ cat hg.pid > $DAEMON_PIDS
+
+Authorized request for valid read-write command works
+
+  $ send << EOF
+  > httprequest POST api/$HTTPV2/rw/customreadonly
+  >     user-agent: test
+  > EOF
+  using raw connection to peer
+  s>     POST /api/exp-http-v2-0001/rw/customreadonly HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 OK\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: text/plain\r\n
+  s>     Content-Length: 18\r\n
+  s>     \r\n
+  s>     rw/customreadonly\n
+
+Authorized request for unknown command is rejected
+
+  $ send << EOF
+  > httprequest POST api/$HTTPV2/rw/badcommand
+  >     user-agent: test
+  > EOF
+  using raw connection to peer
+  s>     POST /api/exp-http-v2-0001/rw/badcommand HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 404 Not Found\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: text/plain\r\n
+  s>     Content-Length: 42\r\n
+  s>     \r\n
+  s>     unknown wire protocol command: badcommand\n