Thu, 08 Mar 2018 16:43:32 -0800 wireprotoserver: remove broken optimization for non-httplib client
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 16:43:32 -0800] rev 36813
wireprotoserver: remove broken optimization for non-httplib client There was an experimental non-httplib client in core for several years. It was removed a week or so ago. We kept the optimization for this client in the server code. I'm not sure if that was intended or not. But it doesn't matter: the code was wrong. Because the code was accessing a WSGI environment dict, it needed to access the HTTP_X_HGHTTP2 key to actually read the HTTP header. So the code deleted by this commit wasn't actually doing anything meaningful. Doh. Differential Revision: https://phab.mercurial-scm.org/D2741
Thu, 08 Mar 2018 15:58:52 -0800 wireprotoserver: move all wire protocol handling logic out of hgweb
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 15:58:52 -0800] rev 36812
wireprotoserver: move all wire protocol handling logic out of hgweb Previous patches from several days ago worked to isolate processing of HTTP wire protocol requests to wireprotoserver. We still had a little logic in hgweb. If feels like the right time to finish the job. This commit moves WSGI request servicing from hgweb to wireprotoserver. The ugly dict holding the parsed request is no more. I think the new code is cleaner. As part of this, we now process wire protocol requests before the block to obtain the "query" variable. This makes it clear that this wonky "query" variable is not used by the wire protocol. The wonkiest part about this code is the HTTP 404. I'm actually not sure what all is going on here. It looks like the code is trying to prevent URL with path components that specify a command from not working. That part I grok. What I don't grok is why we need to send a 404. I would think it would be OK to no-op and let another handler try to service the request. But if we do this, we get some subrepo test failures. So it looks like something is expecting the HTTP 404 and reacting to it in a specific way. It /might/ be possible to change the behavior here. But it isn't something I'm comfortable doing because I don't understand the problem space. Differential Revision: https://phab.mercurial-scm.org/D2740
Thu, 08 Mar 2018 15:37:05 -0800 hgweb: use parsed request to construct query parameters
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 15:37:05 -0800] rev 36811
hgweb: use parsed request to construct query parameters The way hgweb routes requests is kind of bonkers. If PATH_INFO is set, we take the URL path after the repository. Otherwise, we take the first part of the query string before "&" and the part before ";" in that. We then kinda/sorta treat this as a path and route based on that. This commit ports that code to use the parsed request object. This required a new attribute on the parsed request to indicate whether there is any PATH_INFO. The new code still feels a bit convoluted for my liking. But we'll need to rewrite more of the code before a better solution becomes apparant. This code feels strictly better since we're no longer doing low-level WSGI manipulation during routing. Differential Revision: https://phab.mercurial-scm.org/D2739
Thu, 08 Mar 2018 11:33:33 -0800 hgweb: only recognize wire protocol commands from query string (BC)
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 11:33:33 -0800] rev 36810
hgweb: only recognize wire protocol commands from query string (BC) Previously, we attempted to parse the wire protocol command from `req.form`. Data could have come from the query string or POST form data. The wire protocol states that the command must be declared in the query string. And AFAICT all Mercurial releases from at least 1.0 send the command in the query string. So let's actual require this behavior. This is technically BC. But I'm not sure how anyone in the wild would encounter this. POST has historically been used for sending bundle data. So there's no opportunity to encode arguments there. And the experimental HTTP POST args also takes over the body. So the only way someone would be impacted by this is if they wrote a custom client that both used POST for everything and sent arguments via the HTTP body. I don't believe such a client exists. .. bc:: The HTTP wire protocol server no longer accepts the ``cmd`` argument to control which command to run via HTTP POST bodies. The ``cmd`` argument must be specified on the URL query string. Differential Revision: https://phab.mercurial-scm.org/D2738
Thu, 08 Mar 2018 11:21:46 -0800 hgweb: teach WSGI parser about query strings
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 11:21:46 -0800] rev 36809
hgweb: teach WSGI parser about query strings Currently, req.form uses cgi.parse() to populate form data. Depending on the request, form data can come from POST multipart/form-data, application/x-www-form-urlencoded, or the URL query string. Putting all these things into one data structure makes it difficult to reason about how exactly parameters got to the request. It can lead to wonkiness such as pulling parameters from both the URL and POST data. This commit teaches our WSGI request parser about argument data in query strings. We populate fields containing the query string data and only the query string data so it can't be confused with POST data. Differential Revision: https://phab.mercurial-scm.org/D2737
Thu, 08 Mar 2018 15:08:20 -0800 hgweb: use the parsed application path directly
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 15:08:20 -0800] rev 36808
hgweb: use the parsed application path directly Previously, we assigned a custom system string with a trailing slash to wsgirequest.url. The addition of the trailing slash felt arbitrary and seems to go against how things typically work in WSGI. We also want our URLs to be bytes, not system strings. And, assigning a custom attribute to wsgirequest felt wrong. This commit fixes all those things by removing the trailing slash from the app path, changing consumers to use that variable and to use it without a trailing slash, and removing the custom attribute from wsgirequest. We preserve the trailing slash on {url}. Also, makebreadcrumb strips the trailing slash. So no change to it was needed. Differential Revision: https://phab.mercurial-scm.org/D2736
Thu, 08 Mar 2018 12:59:25 -0800 hgweb: use computed base URL from parsed request
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 12:59:25 -0800] rev 36807
hgweb: use computed base URL from parsed request Let's not reinvent URL construction in a function that runs the templater. Differential Revision: https://phab.mercurial-scm.org/D2735
Sat, 10 Mar 2018 10:20:51 -0800 hgweb: parse WSGI request into a data structure
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 10 Mar 2018 10:20:51 -0800] rev 36806
hgweb: parse WSGI request into a data structure Currently, our WSGI applications (hgweb_mod and hgwebdir_mod) process the raw WSGI request instance themselves. This means they have to talk in terms of system strings. And they need to know details about what's in the WSGI request. And in the case of hgweb_mod, it is doing some very funky things with URL parsing to impact dispatching. The code is difficult to read and maintain. This commit introduces parsing of the WSGI request into a higher-level and easier-to-reason-about data structure. To prove it works, we hook it up to hgweb_mod and use it for populating the relative URL on the request instance. We hold off on using it in more places because the logic in hgweb_mod is crazy and I don't want to involve those changes with review of the parsing code. The URL construction code has variations that use the HTTP: Host header (the canonical WSGI way of reconstructing the URL) and with the use of SERVER_NAME. We need to differentiate because hgweb is currently using SERVER_NAME for URL construction. Differential Revision: https://phab.mercurial-scm.org/D2734
Thu, 08 Mar 2018 15:14:32 -0800 hgweb: always use "?" when writing session vars
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 15:14:32 -0800] rev 36805
hgweb: always use "?" when writing session vars This code resolves a string to insert in URLs as part of a query string. Essentially, it resolves the {sessionvars} template keyword, which is used by hgweb templates to build a URL as a string. The whole approach here feels wrong because there's no way of knowing when this code runs how the final URL will look. There could be additional URL fragments added before this template keyword that add a query string component. Furthermore, I don't think there's *any* for req.url to have a query string. That's because the code that populates this variable only takes SCRIPT_NAME and REPO_NAME into account. The "?" character it is searching for would only be added if some code attempted to add QUERY_STRING to the URL. Hacking the code up to raise if "?" is present in the URL yields a clean test suite run. I'm not sure if we broke this code or if it has always been broken. Anyway, this commit removes support for emitting "&" as the first character in {sessionvars} and makes it always emit "?", which is what it was always doing before AFAICT. Differential Revision: https://phab.mercurial-scm.org/D2733
Thu, 08 Mar 2018 15:15:59 -0800 hgweb: rename req to wsgireq
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 15:15:59 -0800] rev 36804
hgweb: rename req to wsgireq We will soon introduce a parsed WSGI request object so we don't have to concern ourselves with low-level WSGI matters. Prepare for multiple request objects by renaming the existing one so it is clear it deals with WSGI. We also remove a symbol import to avoid even more naming confusion. # no-check-commit because of some new foo_bar naming that's required Differential Revision: https://phab.mercurial-scm.org/D2732
(0) -30000 -10000 -3000 -1000 -300 -100 -10 +10 +100 +300 +1000 +3000 +10000 tip