base_url configuration setting #394

simonw · 2019-01-05T23:48:48Z

I've identified a couple of use-cases for running Datasette in a way that over-rides the default way that internal URLs are generated.

Running behind a reverse proxy. I tried running Datasette behind a proxy and found that some of the generated internal links incorrectly referenced http://127.0.0.1:8001/fixtures/... - when they should have been referencing http://my-host.my-domain.com/fixtures/... - this is a problem both for links within the HTML interface but also for the toggle_url keys returned in the JSON as part of the facets datastructure.
I would like it to be possible to host a Datasette instance at e.g. https://www.mynewspaper.com/interactives/2018/election-results/ - either through careful HTTP proxying or, once Datasette has been ported to ASGI, by mounting a Datasette ASGI instance deep within an existing set of URL routes.

I'm going to add a url_prefix configuration option. This will default to "", which means Datasette will behave as it does at the moment - it will use / for most URL prefixes in the HTML version, and an absolute URL derived from the incoming Host header for URLs that are returned as part of the JSON output.

If url_prefix is set to another value (either a full URL or a path) then this path will be appended to all generated URLs.

The text was updated successfully, but these errors were encountered:

simonw · 2019-01-06T00:32:23Z

I found a really nice pattern for writing the unit tests for this (though it would look even nicer with a solution to #395)

@pytest.mark.parametrize("prefix", ["/prefix/", "https://example.com/"])
@pytest.mark.parametrize("path", [
    "/",
    "/fixtures",
    "/fixtures/compound_three_primary_keys",
    "/fixtures/compound_three_primary_keys/a,a,a",
    "/fixtures/paginated_view",
])
def test_url_prefix_config(prefix, path):
    for client in make_app_client(config={
        "url_prefix": prefix,
    }):
        response = client.get(path)
        soup = Soup(response.body, "html.parser")
        for a in soup.findAll("a"):
            href = a["href"]
            if href not in {
                "https://github.com/simonw/datasette",
                "https://github.com/simonw/datasette/blob/master/LICENSE",
                "https://github.com/simonw/datasette/blob/master/tests/fixtures.py",
            }:
                assert href.startswith(prefix), (href, a.parent)

kevindkeogh · 2019-06-06T02:07:59Z

Hey was this ever merged? Trying to run this behind nginx, and encountering this issue.

kevindkeogh · 2019-06-07T15:10:57Z

Putting this here in case anyone else encounters the same issue with nginx, I was able to resolve it by passing the header in the nginx proxy config (i.e., proxy_set_header Host $host).

jsfenfen · 2019-11-21T01:15:34Z

Hey @simonw is the url_prefix config option available in another branch, it looks like you've written some tests for it above? In 0.32 I get "url_prefix is not a valid option". I think this would be really helpful!

This would be really handy for proxying datasette in another domain's subdirectory I believe this will allow folks to run upstream authentication, but the links break if the url_prefix doesn't match.

I'd prefer not to host a proxied version of datasette on a subdomain (e.g. datasette.myurl.com b/c then I gotta worry about sharing authorization cookies with the subdomain, which I just assume not do, but...)

Edit: I see the wip-url-prefix branch, I may try with that 8da2db4

terrycojones · 2019-12-18T17:18:06Z

Agreed, this would be nice to have. I'm currently working around it in nginx with additional location blocks:


    location /datasette/ {
        proxy_pass         http://127.0.0.1:8001/;
        proxy_redirect     off;
        include proxy_params;
    }

    location /dna-protein-genome/ {
        proxy_pass         http://127.0.0.1:8001/dna-protein-genome/;
        proxy_redirect     off;
        include proxy_params;
    }

    location /rna-protein-genome/ {
        proxy_pass         http://127.0.0.1:8001/rna-protein-genome/;
        proxy_redirect     off;
        include proxy_params;
    }

The 2nd and 3rd above are my databases. This works, but I have a small problem with URLs like /rna-protein-genome?params.... that I could fix with some more nginx munging. I seem to do this sort of thing once every 5 years and then have to look it all up again.

Thanks!

terrycojones · 2019-12-18T17:19:46Z

Hmmm, wait, maybe my mindless (copy/paste) use of proxy_redirect is causing me grief...

jsfenfen · 2019-12-18T17:33:23Z

FWIW I did a dumb merge of the branch here: https://github.com/jsfenfen/datasette and it seemed to work in that I could run stuff at a subdirectory, but ended up abandoning it in favor of just posting a subdomain because getting the nginx configs right was making me crazy. I still would prefer posting at a subdirectory but the subdomain seems simpler at the moment.

terrycojones · 2019-12-18T21:24:23Z

@simonw What about allowing a base url. The <base>....</base> tag has been around forever. Then just use all relative URLs, which I guess is likely what you already do. See https://www.w3schools.com/TAGs/tag_base.asp

betatim · 2020-03-23T23:03:42Z

On mybinder.org we allow access to arbitrary processes listening on a port inside the container via a reverse proxy.

This means we need support for a proxy prefix as the proxy ends up running at a URL like /something/random/proxy/datasette/...

An example that shows the problem is https://github.com/psychemedia/jupyterserverproxy-datasette-demo. Launch directly into a datasette instance on mybinder.org with https://mybinder.org/v2/gh/psychemedia/jupyterserverproxy-datasette-demo/master?urlpath=datasette then try to follow links inside the UI.

wragge · 2020-03-23T23:12:18Z

This would also be useful for running Datasette in Jupyter notebooks on Binder. While you can use Jupyter-server-proxy to access Datasette on Binder, the links are broken.

Why run Datasette on Binder? I'm developing a range of Jupyter notebooks that are aimed at getting humanities researchers to explore data from libraries, archives, and museums. Many of them are aimed at researchers with limited digital skills, so being able to run examples in Binder without them installing anything is fantastic.

For example, there are a series of notebooks that help researchers harvest digitised historical newspaper articles from Trove. The metadata from this harvest is saved as a CSV file that users can download. I've also provided some extra notebooks that use Pandas etc to demonstrate ways of analysing and visualising the harvested data.

But it would be really nice if, after completing a harvest, the user could spin up Datasette for some initial exploration of their harvested data without ever leaving their browser.

terrycojones · 2020-03-23T23:22:10Z

I just updated #652 to remove a merge conflict. I think it's an easy way to add this functionality. I don't have time to do more though, sorry!

simonw · 2020-03-23T23:27:44Z

Thanks very much @terrycojones - I'll see if I can finish it up from here.

terrycojones · 2020-03-23T23:37:06Z

@simonw You're welcome - I was just trying it out back in December as I thought it should work. Now there's a pandemic to work on though.... so no time at all for more at the moment. BTW, I have datasette running on several protein and full (virus) genome databases I build, and it's great - thank you! Hi and best regards to you & Nat :-)

simonw · 2020-03-24T01:34:06Z

I don't think I'll go with the <base> solution purely because it doesn't work with JSON APIs - and there are quite a few places where Datasette APIs return URLs (for things like toggling facets - e.g. suggested_facets on https://latest.datasette.io/fixtures/facetable.json?_labels=on&_size=0 )

The good news is that if you look at the templates almost all of the URLs have been generated in Python code: https://github.com/simonw/datasette/blob/a498d0fe6590f9bdbc4faf9e0dd5faeb3b06002c/datasette/templates/table.html - so it shouldn't be too hard to fix in Python. Ideally I'd like to fix this with as few template changes as possible.

simonw · 2020-03-24T20:59:28Z

Here's the line I'm stuck on now:

datasette/datasette/views/base.py

Line 417 in 298a899

url_csv = path_with_format(request, "csv", url_csv_args)

Tricky question: do I continue to rebuild URLs based on the incoming request (on the assumption that it has been modified to the new thing) or do I expect that I may still see un-prefixed incoming requests and need to change them?

If the incoming URL paths contain the prefix, at what point do I drop that so I can run the regular URL matching code?

simonw · 2020-03-24T21:14:31Z

I'm going to assume that whatever is proxying to Datasette leaves the full incoming URL path intact, so I'm going to need to teach the URL routing code to strip off the prefix before processing the incoming request.

simonw · 2020-03-24T21:15:28Z

That means I should teach AsgiRouter how to handle an optional prefix:

datasette/datasette/utils/asgi.py

Lines 81 to 93 in 298a899

    
           class AsgiRouter: 
        
               def __init__(self, routes=None): 
        
                   routes = routes or [] 
        
                   self.routes = [ 
        
                       # Compile any strings to regular expressions 
        
                       ((re.compile(pattern) if isinstance(pattern, str) else pattern), view) 
        
                       for pattern, view in routes 
        
                   ] 
        
               async def __call__(self, scope, receive, send): 
        
                   # Because we care about "foo/bar" v.s. "foo%2Fbar" we decode raw_path ourselves 
        
                   path = scope["path"] 
        
                   raw_path = scope.get("raw_path")

simonw · 2020-03-24T21:16:34Z

Actually I'll teach DatasetteRouter since that subclasses AsgiRouter but has access to a datasette instance (which it can read configuration values from):

datasette/datasette/app.py

Lines 750 to 753 in 298a899

    
           class DatasetteRouter(AsgiRouter): 
        
               def __init__(self, datasette, routes): 
        
                   self.ds = datasette 
        
                   super().__init__(routes)

simonw · 2020-03-24T21:55:46Z

OK, I have an implementation of this over in the base-url branch (see pull request #708) which is passing all of the unit tests.

Anyone willing to give it a quick test and see if it works for your particular use-case? You can install it with:

pip install https://github.com/simonw/datasette/archive/base-url.zip

Then you can run Datasette like this:

datasette fixtures.db --config base_url:/new-base/path/here/

terrycojones · 2020-03-24T22:33:23Z

Hi Simon - I'm just (trying, at least) to follow along in the above. I can't try it out now, but I will if no one else gets to it. Sorry I didn't write any tests in the original bit of code I pushed - I was just trying to see if it could work & whether you'd want to maybe head in that direction. Anyway, thank you, I will certainly use this. Comment back here if no one tried it out & I'll make time.

simonw · 2020-03-25T00:17:24Z

I got this working as a proxied instance inside Binder, building on @psychemedia's work: simonw/jupyterserverproxy-datasette-demo#1

Now that I've seen it working there I'm going to land the pull request.

simonw · 2020-03-25T04:19:08Z

Shipped in 0.39: https://datasette.readthedocs.io/en/latest/changelog.html#v0-39

terrycojones · 2020-03-25T13:48:13Z

Great - thanks again.

wragge · 2020-03-26T00:56:30Z

Thanks! I'm trying to launch Datasette from within a notebook using the jupyter-server-proxy and the new base_url parameter. While the assets load ok, and the breadcrumb navigation works, the facet links don't seem to use the base_url. Or have I missed something?

My test repository is here: https://github.com/wragge/datasette-test

* base_url configuration setting * base_url works for static assets as well

LVerneyPEReN · 2020-06-10T09:49:34Z

Hi,

I came across this issue while looking for a way to spawn Datasette as a SQLite files viewer in JupyterLab. I found https://github.com/simonw/jupyterserverproxy-datasette-demo which seems to be the most up to date proof of concept, but it seems to be failing to list the available db (at least in the Binder demo, https://hub.gke.mybinder.org/user/simonw-jupyters--datasette-demo-uw4dmlnn/datasette/, I only have :memory).

Does anyone tried to improve on this proof of concept to have a Datasette visualization for SQLite files?

Thanks!

wragge · 2020-06-10T10:22:54Z

There's a working demo here: https://github.com/wragge/datasette-test

And if you want something that's more than just proof-of-concept, here's a notebook which does some harvesting from web archives and then displays the results using Datasette: https://nbviewer.jupyter.org/github/GLAM-Workbench/web-archives/blob/master/explore_presentations.ipynb

LVerneyPEReN · 2020-06-11T09:15:19Z

Hi @wragge,

This looks great, thanks for the share! I refactored it into a self-contained function, binding on a random available TCP port (multi-user context). I am using subprocess API directly since the %run magic was leaving defunct process behind :/

import socket

from signal import SIGINT
from subprocess import Popen, PIPE

from IPython.display import display, HTML
from notebook.notebookapp import list_running_servers


def get_free_tcp_port():
    """
    Get a free TCP port.
    """
    tcp = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    tcp.bind(('', 0))
    _, port = tcp.getsockname()
    tcp.close()
    return port


def datasette(database):
    """
    Run datasette on an SQLite database.
    """
    # Get current running servers
    servers = list_running_servers()

    # Get the current base url
    base_url = next(servers)['base_url']

    # Get a free port
    port = get_free_tcp_port()

    # Create a base url for Datasette suing the proxy path
    proxy_url = f'{base_url}proxy/absolute/{port}/'

    # Display a link to Datasette
    display(HTML(f'<p><a href="{proxy_url}">View Datasette</a> (Click on the stop button to close the Datasette server)</p>'))

    # Launch Datasette
    with Popen(
        [
            'python', '-m', 'datasette', '--',
            database,
            '--port', str(port),
            '--config', f'base_url:{proxy_url}'
        ],
        stdout=PIPE,
        stderr=PIPE,
        bufsize=1,
        universal_newlines=True
    ) as p:
        print(p.stdout.readline(), end='')
        while True:
            try:
                line = p.stderr.readline()
                if not line:
                    break
                print(line, end='')
                exit_code = p.poll()
            except KeyboardInterrupt:
                p.send_signal(SIGINT)

Ideally, I'd like some extra magic to notify users when they are leaving the closing the notebook tab and make them terminate the running datasette processes. I'll be looking for it.

simonw added small feature labels Jan 5, 2019

terrycojones mentioned this issue Dec 18, 2019

Quick (and uninformed and perhaps misguided) attempt to add a <base> url for hosting datasette at a particular host/URI #652

Closed

simonw pinned this issue Mar 24, 2020

simonw changed the title ~~url_prefix config setting~~ base_url configuritaion setting Mar 24, 2020

simonw changed the title ~~base_url configuritaion setting~~ base_url configuration setting Mar 24, 2020

simonw added a commit that referenced this issue Mar 24, 2020

base_url configuration setting, refs #394

0b37a10

simonw mentioned this issue Mar 24, 2020

base_url configuration setting, refs #394 #708

Merged

simonw added this to the Datasette 0.39 milestone Mar 24, 2020

simonw mentioned this issue Mar 24, 2020

Get this working with the new Datasette base_url feature simonw/jupyterserverproxy-datasette-demo#1

Closed

simonw closed this as completed in 7656fd6 Mar 25, 2020

simonw unpinned this issue Mar 25, 2020

simonw mentioned this issue Mar 26, 2020

base_url doesn't entirely work for running Datasette inside Binder #712

Closed

simonw added a commit that referenced this issue Mar 26, 2020

base_url configuration setting, closes #394

a607b99

* base_url configuration setting * base_url works for static assets as well

simonw added a commit that referenced this issue Apr 2, 2020

base_url configuration setting, closes #394

5a1f0e8

* base_url configuration setting * base_url works for static assets as well

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

base_url configuration setting #394

base_url configuration setting #394

simonw commented Jan 5, 2019

simonw commented Jan 6, 2019 •

edited

Loading

kevindkeogh commented Jun 6, 2019

kevindkeogh commented Jun 7, 2019 •

edited

Loading

jsfenfen commented Nov 21, 2019 •

edited

Loading

terrycojones commented Dec 18, 2019

terrycojones commented Dec 18, 2019

jsfenfen commented Dec 18, 2019

terrycojones commented Dec 18, 2019

betatim commented Mar 23, 2020

wragge commented Mar 23, 2020

terrycojones commented Mar 23, 2020

simonw commented Mar 23, 2020

terrycojones commented Mar 23, 2020

simonw commented Mar 24, 2020 •

edited

Loading

simonw commented Mar 24, 2020

simonw commented Mar 24, 2020

simonw commented Mar 24, 2020

simonw commented Mar 24, 2020

simonw commented Mar 24, 2020 •

edited

Loading

terrycojones commented Mar 24, 2020

simonw commented Mar 25, 2020

simonw commented Mar 25, 2020

terrycojones commented Mar 25, 2020

wragge commented Mar 26, 2020

LVerneyPEReN commented Jun 10, 2020

wragge commented Jun 10, 2020

LVerneyPEReN commented Jun 11, 2020

base_url configuration setting #394

base_url configuration setting #394

Comments

simonw commented Jan 5, 2019

simonw commented Jan 6, 2019 • edited Loading

kevindkeogh commented Jun 6, 2019

kevindkeogh commented Jun 7, 2019 • edited Loading

jsfenfen commented Nov 21, 2019 • edited Loading

terrycojones commented Dec 18, 2019

terrycojones commented Dec 18, 2019

jsfenfen commented Dec 18, 2019

terrycojones commented Dec 18, 2019

betatim commented Mar 23, 2020

wragge commented Mar 23, 2020

terrycojones commented Mar 23, 2020

simonw commented Mar 23, 2020

terrycojones commented Mar 23, 2020

simonw commented Mar 24, 2020 • edited Loading

simonw commented Mar 24, 2020

simonw commented Mar 24, 2020

simonw commented Mar 24, 2020

simonw commented Mar 24, 2020

simonw commented Mar 24, 2020 • edited Loading

terrycojones commented Mar 24, 2020

simonw commented Mar 25, 2020

simonw commented Mar 25, 2020

terrycojones commented Mar 25, 2020

wragge commented Mar 26, 2020

LVerneyPEReN commented Jun 10, 2020

wragge commented Jun 10, 2020

LVerneyPEReN commented Jun 11, 2020

simonw commented Jan 6, 2019 •

edited

Loading

kevindkeogh commented Jun 7, 2019 •

edited

Loading

jsfenfen commented Nov 21, 2019 •

edited

Loading

simonw commented Mar 24, 2020 •

edited

Loading

simonw commented Mar 24, 2020 •

edited

Loading