Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table+query JSON and CSV links broken when using base_url setting #1590

Closed
eelkevdbos opened this issue Jan 11, 2022 · 11 comments
Closed

Table+query JSON and CSV links broken when using base_url setting #1590

eelkevdbos opened this issue Jan 11, 2022 · 11 comments
Labels

Comments

@eelkevdbos
Copy link

eelkevdbos commented Jan 11, 2022

Datasette appends the prefix found in the base_url setting twice if a base_url is set.

In the follow asgi example, I'm hosting a custom Datasette instance:

# asgi.py
import pathlib
from asgi_cors import asgi_cors
from channels.routing import URLRouter
from django.urls import re_path
from datasette.app import Datasette

datasette_ = Datasette(
    files=[],
    settings={
        "base_url": "/datasettes/",
        "plugins": {}
    },
    config_dir=pathlib.Path('.'),
)
application = URLRouter([
  re_path(r"^datasettes/.*", asgi_cors(datasette_.app(), allow_all=True)),
])

Running it with:

$ daphne -p 8002 asgi:application

Using a simple query on the _memory table:

select sqlite_version()

http://localhost:8002/datasettes/_memory?sql=select+sqlite_version%28%29

It renders the following upon inspection:
image

I am using datasette version 0.59.4

@simonw simonw added the bug label Jan 12, 2022
@simonw
Copy link
Owner

simonw commented Jan 12, 2022

Thanks for the steps to reproduce - I have your bug running on my laptop now.

I've been mostly testing this stuff using the hosted copy of Datasette here, which doesn't exhibit the bug: https://datasette-apache-proxy-demo.fly.dev/prefix/fixtures?sql=select+sqlite_version%28%29

Something interesting definitely going on here!

@simonw
Copy link
Owner

simonw commented Jan 12, 2022

I'm using the https://datasette.io/plugins/datasette-debug-asgi plugin to investigate.

On my laptop using Daphne I get this: http://127.0.0.1:8032/datasettes/-/asgi-scope

{'actor': None,
 'asgi': {'version': '3.0'},
 'client': ['127.0.0.1', 53767],
 'csrftoken': <function asgi_csrf_decorator.<locals>._asgi_csrf_decorator.<locals>.app_wrapped_with_csrf.<locals>.get_csrftoken at 0x1122aeef0>,
 'headers': [(b'host', b'127.0.0.1:8032'),
             (b'user-agent',
              b'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:95.0) Gecko'
              b'/20100101 Firefox/95.0'),
             (b'accept',
              b'text/html,application/xhtml+xml,application/xml;q=0.9,image/'
              b'avif,image/webp,*/*;q=0.8'),
             (b'accept-language', b'en-US,en;q=0.5'),
             (b'accept-encoding', b'gzip, deflate'),
             (b'dnt', b'1'),
             (b'connection', b'keep-alive'),
             (b'cookie', b'_ga=GA1.1.742283954.1628542653'),
             (b'upgrade-insecure-requests', b'1'),
             (b'sec-fetch-dest', b'document'),
             (b'sec-fetch-mode', b'navigate'),
             (b'sec-fetch-site', b'none'),
             (b'sec-fetch-user', b'?1')],
 'http_version': '1.1',
 'method': 'GET',
 'path': '/datasettes/-/asgi-scope',
 'path_remaining': '',
 'query_string': b'',
 'raw_path': b'/datasettes/-/asgi-scope',
 'root_path': '',
 'route_path': '/-/asgi-scope',
 'scheme': 'http',
 'server': ['127.0.0.1', 8032],
 'type': 'http',
 'url_route': {'kwargs': {}}}

On the demo running on Fly (which I just redeployed with that plugin) I get this: https://datasette-apache-proxy-demo.fly.dev/prefix/-/asgi-scope

{'actor': None,
 'asgi': {'spec_version': '2.1', 'version': '3.0'},
 'client': ('86.109.12.167', 0),
 'csrftoken': <function asgi_csrf_decorator.<locals>._asgi_csrf_decorator.<locals>.app_wrapped_with_csrf.<locals>.get_csrftoken at 0x7f4c0413bca0>,
 'headers': [(b'host', b'datasette-apache-proxy-demo.fly.dev'),
             (b'user-agent',
              b'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:95.0) Gecko'
              b'/20100101 Firefox/95.0'),
             (b'accept',
              b'text/html,application/xhtml+xml,application/xml;q=0.9,image/'
              b'avif,image/webp,*/*;q=0.8'),
             (b'accept-language', b'en-US,en;q=0.5'),
             (b'accept-encoding', b'gzip, deflate, br'),
             (b'dnt', b'1'),
             (b'x-request-start', b't=1641950740651658'),
             (b'sec-fetch-dest', b'document'),
             (b'sec-fetch-mode', b'navigate'),
             (b'sec-fetch-site', b'none'),
             (b'sec-fetch-user', b'?1'),
             (b'fly-client-ip', b'24.5.172.176'),
             (b'x-forwarded-for',
              b'24.5.172.176, 213.188.193.173, 86.109.12.167'),
             (b'fly-forwarded-proto', b'https'),
             (b'x-forwarded-proto', b'https'),
             (b'fly-forwarded-ssl', b'on'),
             (b'x-forwarded-ssl', b'on'),
             (b'fly-forwarded-port', b'443'),
             (b'x-forwarded-port', b'443'),
             (b'fly-region', b'sjc'),
             (b'fly-request-id', b'01FS5Y805BX43HM94T8XW610KG'),
             (b'via', b'2 fly.io'),
             (b'fly-dispatch-start', b't=1641950740683198;instance=87f188a2'),
             (b'x-forwarded-host', b'datasette-apache-proxy-demo.fly.dev'),
             (b'x-forwarded-server', b'localhost'),
             (b'connection', b'Keep-Alive')],
 'http_version': '1.1',
 'method': 'GET',
 'path': '/-/asgi-scope',
 'query_string': b'',
 'raw_path': b'/-/asgi-scope',
 'root_path': '',
 'scheme': 'https',
 'server': ('127.0.0.1', 8001),
 'type': 'http',
 'url_route': {'kwargs': {}}}

The version that works as 'raw_path': b'/-/asgi-scope' - the version that fails has 'raw_path': b'/datasettes/-/asgi-scope'.

@simonw
Copy link
Owner

simonw commented Jan 12, 2022

The Daphne one has this key: 'route_path': '/-/asgi-scope',

Maybe Datasette's routing code needs to look out for that, if it's available, and use it to reconstruct the requested path?

The code in question is here:

datasette/datasette/app.py

Lines 1143 to 1149 in 8c401ee

async def route_path(self, scope, receive, send, path):
# Strip off base_url if present before routing
base_url = self.ds.setting("base_url")
if base_url != "/" and path.startswith(base_url):
path = "/" + path[len(base_url) :]
scope = dict(scope, route_path=path)
request = Request(scope, receive)

@simonw
Copy link
Owner

simonw commented Jan 12, 2022

Oh wait! It looks like route_path is something I invented there.

Yup, I added it in a634121 - commit message says:

  • new route_path key in request.scope storing the path that was used for routing with the base_url prefix stripped

So actually part of the mystery here is: why does the Fly hosted one NOT have that key?

@simonw
Copy link
Owner

simonw commented Jan 12, 2022

Looking closer at the code quoted above, it doesn't modify path or raw_path at all - ALL it does is add the route_path to the scope.

@eelkevdbos
Copy link
Author

Thank you for the quick reply! Just a quick observation, I am running this locally without a proxy, whereas your fly example seems to be running behind an apache proxy (if the name is accurate). Can it be that the apache proxy strips the prefix before it passes on the request to the daphne backend?

@eelkevdbos
Copy link
Author

In my example, path matching happens at the application layer (being the Django channels URLRouter). That might be a somewhat exotic solution that would normally be solved by a proxy like Apache or Nginx. However, in my specific use case, this is a "feature" enabling me to do simple management of databases and metadata from within a Django admin app instance mapped in that same router.

@simonw simonw added this to the Datasette 0.60 milestone Jan 13, 2022
@simonw
Copy link
Owner

simonw commented Jan 13, 2022

Seeing as this area of the code has produced so many bugs in the past, I think part of the fix may be to write comprehensive documentation about how routing works for the internals documentation. Doing so might help me figure this bug out!

@simonw
Copy link
Owner

simonw commented Jan 14, 2022

Since this is a special case bug for when using Datasette as a library I wonder if a good fix here would be to support something like this:

application = URLRouter([
  re_path(r"^datasettes/.*", asgi_cors(datasette_.app(remove_path_prefix="datasettes/"), allow_all=True)),
])

@simonw
Copy link
Owner

simonw commented Jan 14, 2022

I think this prefixed string mechanism is supposed to prevent the base_url prefix from being applied twice:

def path(self, path, format=None):
if not isinstance(path, PrefixedUrlString):
if path.startswith("/"):
path = path[1:]
path = self.ds.setting("base_url") + path
if format is not None:
path = path_with_format(path=path, format=format)
return PrefixedUrlString(path)

But with a bit of extra logging all of the inputs to that are NOT prefixed strings:

Urls.path called with: /datasettes/fixtures/compound_three_primary_keys?_sort=content (PrefixedUrlString = False)
  returning /datasettes/datasettes/fixtures/compound_three_primary_keys?_sort=content

So it looks like urls.path(...) is indeed the code responsible for doubling up that /datasettes/ prefix.

@simonw
Copy link
Owner

simonw commented Jan 14, 2022

OK, I'm going to recommend a workaround for this instead. Here's asgi.py updated to strip the prefix before passing the request on to Datasette:

import pathlib
from asgi_cors import asgi_cors
from channels.routing import URLRouter
from django.urls import re_path
from datasette.app import Datasette


def rewrite_path(app, prefix_to_strip):
    async def rewrite_path_app(scope, receive, send):
        if (
            scope["type"] == "http"
            and "path" in scope
            and scope["path"].startswith(prefix_to_strip)
        ):
            scope["path"] = scope["path"][len(prefix_to_strip) :]
            if "raw_path" in scope:
                scope["raw_path"] = scope["raw_path"][len(prefix_to_strip) :]
        await app(scope, receive, send)

    return rewrite_path_app


datasette_ = Datasette(
    files=["fixtures.db"],
    settings={"base_url": "/datasettes/", "plugins": {}},
)
application = URLRouter(
    [
        re_path(
            r"^datasettes/.*",
            asgi_cors(rewrite_path(datasette_.app(), "/datasettes"), allow_all=True),
        ),
    ]
)

This works on my laptop - please re-open the ticket if it doesn't work for you!

@simonw simonw closed this as completed Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants