-
-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base_url configuration setting #394
Comments
I found a really nice pattern for writing the unit tests for this (though it would look even nicer with a solution to #395) @pytest.mark.parametrize("prefix", ["/prefix/", "https://example.com/"])
@pytest.mark.parametrize("path", [
"/",
"/fixtures",
"/fixtures/compound_three_primary_keys",
"/fixtures/compound_three_primary_keys/a,a,a",
"/fixtures/paginated_view",
])
def test_url_prefix_config(prefix, path):
for client in make_app_client(config={
"url_prefix": prefix,
}):
response = client.get(path)
soup = Soup(response.body, "html.parser")
for a in soup.findAll("a"):
href = a["href"]
if href not in {
"https://github.com/simonw/datasette",
"https://github.com/simonw/datasette/blob/master/LICENSE",
"https://github.com/simonw/datasette/blob/master/tests/fixtures.py",
}:
assert href.startswith(prefix), (href, a.parent) |
Hey was this ever merged? Trying to run this behind nginx, and encountering this issue. |
Putting this here in case anyone else encounters the same issue with nginx, I was able to resolve it by passing the header in the nginx proxy config (i.e., |
Hey @simonw is the url_prefix config option available in another branch, it looks like you've written some tests for it above? In 0.32 I get "url_prefix is not a valid option". I think this would be really helpful! This would be really handy for proxying datasette in another domain's subdirectory I believe this will allow folks to run upstream authentication, but the links break if the url_prefix doesn't match. I'd prefer not to host a proxied version of datasette on a subdomain (e.g. datasette.myurl.com b/c then I gotta worry about sharing authorization cookies with the subdomain, which I just assume not do, but...) Edit: I see the wip-url-prefix branch, I may try with that 8da2db4 |
Agreed, this would be nice to have. I'm currently working around it in
The 2nd and 3rd above are my databases. This works, but I have a small problem with URLs like Thanks! |
Hmmm, wait, maybe my mindless (copy/paste) use of |
FWIW I did a dumb merge of the branch here: https://github.com/jsfenfen/datasette and it seemed to work in that I could run stuff at a subdirectory, but ended up abandoning it in favor of just posting a subdomain because getting the nginx configs right was making me crazy. I still would prefer posting at a subdirectory but the subdomain seems simpler at the moment. |
@simonw What about allowing a base url. The |
On mybinder.org we allow access to arbitrary processes listening on a port inside the container via a reverse proxy. This means we need support for a proxy prefix as the proxy ends up running at a URL like An example that shows the problem is https://github.com/psychemedia/jupyterserverproxy-datasette-demo. Launch directly into a datasette instance on mybinder.org with https://mybinder.org/v2/gh/psychemedia/jupyterserverproxy-datasette-demo/master?urlpath=datasette then try to follow links inside the UI. |
This would also be useful for running Datasette in Jupyter notebooks on Binder. While you can use Jupyter-server-proxy to access Datasette on Binder, the links are broken. Why run Datasette on Binder? I'm developing a range of Jupyter notebooks that are aimed at getting humanities researchers to explore data from libraries, archives, and museums. Many of them are aimed at researchers with limited digital skills, so being able to run examples in Binder without them installing anything is fantastic. For example, there are a series of notebooks that help researchers harvest digitised historical newspaper articles from Trove. The metadata from this harvest is saved as a CSV file that users can download. I've also provided some extra notebooks that use Pandas etc to demonstrate ways of analysing and visualising the harvested data. But it would be really nice if, after completing a harvest, the user could spin up Datasette for some initial exploration of their harvested data without ever leaving their browser. |
I just updated #652 to remove a merge conflict. I think it's an easy way to add this functionality. I don't have time to do more though, sorry! |
Thanks very much @terrycojones - I'll see if I can finish it up from here. |
@simonw You're welcome - I was just trying it out back in December as I thought it should work. Now there's a pandemic to work on though.... so no time at all for more at the moment. BTW, I have datasette running on several protein and full (virus) genome databases I build, and it's great - thank you! Hi and best regards to you & Nat :-) |
I don't think I'll go with the The good news is that if you look at the templates almost all of the URLs have been generated in Python code: https://github.com/simonw/datasette/blob/a498d0fe6590f9bdbc4faf9e0dd5faeb3b06002c/datasette/templates/table.html - so it shouldn't be too hard to fix in Python. Ideally I'd like to fix this with as few template changes as possible. |
Here's the line I'm stuck on now: datasette/datasette/views/base.py Line 417 in 298a899
Tricky question: do I continue to rebuild URLs based on the incoming If the incoming URL paths contain the prefix, at what point do I drop that so I can run the regular URL matching code? |
I'm going to assume that whatever is proxying to Datasette leaves the full incoming URL path intact, so I'm going to need to teach the URL routing code to strip off the prefix before processing the incoming request. |
That means I should teach datasette/datasette/utils/asgi.py Lines 81 to 93 in 298a899
|
Actually I'll teach Lines 750 to 753 in 298a899
|
OK, I have an implementation of this over in the Anyone willing to give it a quick test and see if it works for your particular use-case? You can install it with:
Then you can run Datasette like this:
|
Hi Simon - I'm just (trying, at least) to follow along in the above. I can't try it out now, but I will if no one else gets to it. Sorry I didn't write any tests in the original bit of code I pushed - I was just trying to see if it could work & whether you'd want to maybe head in that direction. Anyway, thank you, I will certainly use this. Comment back here if no one tried it out & I'll make time. |
I got this working as a proxied instance inside Binder, building on @psychemedia's work: simonw/jupyterserverproxy-datasette-demo#1 Now that I've seen it working there I'm going to land the pull request. |
Shipped in 0.39: https://datasette.readthedocs.io/en/latest/changelog.html#v0-39 |
Great - thanks again. |
Thanks! I'm trying to launch Datasette from within a notebook using the jupyter-server-proxy and the new My test repository is here: https://github.com/wragge/datasette-test |
* base_url configuration setting * base_url works for static assets as well
* base_url configuration setting * base_url works for static assets as well
Hi, I came across this issue while looking for a way to spawn Datasette as a SQLite files viewer in JupyterLab. I found https://github.com/simonw/jupyterserverproxy-datasette-demo which seems to be the most up to date proof of concept, but it seems to be failing to list the available db (at least in the Binder demo, https://hub.gke.mybinder.org/user/simonw-jupyters--datasette-demo-uw4dmlnn/datasette/, I only have Does anyone tried to improve on this proof of concept to have a Datasette visualization for SQLite files? Thanks! |
There's a working demo here: https://github.com/wragge/datasette-test And if you want something that's more than just proof-of-concept, here's a notebook which does some harvesting from web archives and then displays the results using Datasette: https://nbviewer.jupyter.org/github/GLAM-Workbench/web-archives/blob/master/explore_presentations.ipynb |
Hi @wragge, This looks great, thanks for the share! I refactored it into a self-contained function, binding on a random available TCP port (multi-user context). I am using subprocess API directly since the import socket
from signal import SIGINT
from subprocess import Popen, PIPE
from IPython.display import display, HTML
from notebook.notebookapp import list_running_servers
def get_free_tcp_port():
"""
Get a free TCP port.
"""
tcp = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcp.bind(('', 0))
_, port = tcp.getsockname()
tcp.close()
return port
def datasette(database):
"""
Run datasette on an SQLite database.
"""
# Get current running servers
servers = list_running_servers()
# Get the current base url
base_url = next(servers)['base_url']
# Get a free port
port = get_free_tcp_port()
# Create a base url for Datasette suing the proxy path
proxy_url = f'{base_url}proxy/absolute/{port}/'
# Display a link to Datasette
display(HTML(f'<p><a href="{proxy_url}">View Datasette</a> (Click on the stop button to close the Datasette server)</p>'))
# Launch Datasette
with Popen(
[
'python', '-m', 'datasette', '--',
database,
'--port', str(port),
'--config', f'base_url:{proxy_url}'
],
stdout=PIPE,
stderr=PIPE,
bufsize=1,
universal_newlines=True
) as p:
print(p.stdout.readline(), end='')
while True:
try:
line = p.stderr.readline()
if not line:
break
print(line, end='')
exit_code = p.poll()
except KeyboardInterrupt:
p.send_signal(SIGINT) Ideally, I'd like some extra magic to notify users when they are leaving the closing the notebook tab and make them terminate the running datasette processes. I'll be looking for it. |
I've identified a couple of use-cases for running Datasette in a way that over-rides the default way that internal URLs are generated.
http://127.0.0.1:8001/fixtures/...
- when they should have been referencinghttp://my-host.my-domain.com/fixtures/...
- this is a problem both for links within the HTML interface but also for thetoggle_url
keys returned in the JSON as part of the facets datastructure.https://www.mynewspaper.com/interactives/2018/election-results/
- either through careful HTTP proxying or, once Datasette has been ported to ASGI, by mounting a Datasette ASGI instance deep within an existing set of URL routes.I'm going to add a
url_prefix
configuration option. This will default to""
, which means Datasette will behave as it does at the moment - it will use/
for most URL prefixes in the HTML version, and an absolute URL derived from the incomingHost
header for URLs that are returned as part of the JSON output.If
url_prefix
is set to another value (either a full URL or a path) then this path will be appended to all generated URLs.The text was updated successfully, but these errors were encountered: