Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path-based interactive tools stopped working in 22.05 #14690

Closed
sveinugu opened this issue Sep 23, 2022 · 17 comments · Fixed by #16795
Closed

Path-based interactive tools stopped working in 22.05 #14690

sveinugu opened this issue Sep 23, 2022 · 17 comments · Fixed by #16795

Comments

@sveinugu
Copy link
Contributor

Describe the bug

Although undocumented, path-based interactive tools have been (more or less) working since 21.01 based on PR #8596. With 22.05 and the discontinuation of uWSGI, this has stopped working "out-of-the-box". It seems the only thing lacking is to reroute path-based interactive tool URLs similarly to what was added to galaxy.yml in this commit.

I do not know if it is possible to configure gunicorn in a similar way, or if there are other solutions for bringing path-based interactive tools back to life. One option is, I believe, to set up rerouting on a proxy web server (e.g. nginx), but then one loses the ability for Galaxy to serve path-based interactive tools as stand-alone.

Path-based interactive tools means that for interactive tools that support it (with requires_domain=False) there is no need to set up wildcard DNS entries with SSL support, which is both expensive, and cumbersome for development. I believe the adoption and use of interactive tools has been hampered by the need for wildcard DNS. Unfortunately, it also seems the path-based interactive tools feature has been under-communicated to the developer and admin community. And now it stopped working.

I believe the main reason why this feature might have lost steam is that it needs support also within the docker containers. However, I think that with good documentation and visibility, it is not too big of a hurdle to update at least the most used interactive tools to support path-based URLs.

@blankenberg @mvdbeek Are there any plans to fix this? We are at least eager to get this working again, and can help out if needed!

Galaxy Version and/or server at which you observed the bug
Galaxy Version: 22.05

To Reproduce
Steps to reproduce the behavior:
Set up a plain, stand-alone Galaxy with the configuration described in galaxy.yml.interactivetools and job_conf.xml.interactivetools. Run an interactive tool with requires_domain=False (which is a bit cumbersome, as I don't think any of the bundled ones support that).

Expected behavior
The interactive tool should work without wild-card DNS

@hexylena
Copy link
Member

Path based interactive tools have been sorely missed since we switched to the newer GxITs. It is mandatory for so so many institutions, especially smaller ones that don't have wildcard DNS entries which is a huge barrier to entry. I cannot run them at one of my organisations, because that simply isn't available. Thanks for pushing for this to be brought back @sveinugu it'll be appreciated at smaller galaxies :)

@sveinugu
Copy link
Contributor Author

sveinugu commented Oct 9, 2022

Taking the communication related to a real fix of path-based interactive tools back from #14694 to this issue, where it belongs:

To put it simply: Can one conclude that if one is able to run an IT with requires-domainset to True on a laptop with the configuration you referred to, then one could switch the requires-domain setting for that IT to False?

From personal communication at the European Galaxy Days, I have now understood that localhost here acts just like any other domain and one cannot conclude from the current dev setup whether an interactive tool requires domain or not. Which points towards adding support for path-based URLs in gx-it-proxy to be the best way forward. This would allow for a dev setup to test whether ITs supports path-based URLs, as well as a prod setup without the nginx proxy hack in #14694, which really only translates a path-based URL into a subdomain-based one for the gx-it-proxy to recognize.

@sveinugu
Copy link
Contributor Author

sveinugu commented Oct 10, 2022

Here is the main issue (edited from #14694 (comment) based on better understanding of the issue):

  1. Path-based Interactive Tools might require changes to the underlying docker image to work. The web server running in the container must be happy with running under a specific path prefix with the same domain as the Galaxy installation. Even though many of the current IT docker images supports this (including I believe Jupyter Notebook and RStudio), this is not formally tested in any way. Consequently, requires_domain is set defensively to True for most ITs.
  2. For local development and tests (?) (with interactivetools_upstream_proxy: false), ITs are mounted under subdomains of localhost (domain-based) instead of under a specific path prefix (path-based), which means it cannot be used for testing whether setting requires_domain=False will work.
  3. The lack of ITs tested to work under path-based URLs (with requires_domain=False) is a major showstopper in deployment, at least for smaller Galaxies and in restricted situations (such as cloud deployments etc). See Path-based interactive tools stopped working in 22.05 #14690 (comment).
  4. Since 22.05, there is no way to get path-based Interactive Tools to work locally without setting up an nginx server (as described in Allow path-based interactive tools using nginx proxy #14694), which makes it very hard to fix 1, and consequently 3.

@hexylena
Copy link
Member

gx-it-proxy supports path based, I don't think anyone ever ripped out that support. https://github.com/galaxyproject/gx-it-proxy/blob/62fd619bd2bad542077e6a87498a47b00e05f91f/lib/proxy.js#L51

@hexylena
Copy link
Member

Jupyter Notebook if I recall required you set a specific variable for path based support, to get it running at a specific location. Rstudio does as well. I don't believe the current runner sets these required path variables

@sveinugu
Copy link
Contributor Author

gx-it-proxy supports path based, I don't think anyone ever ripped out that support. https://github.com/galaxyproject/gx-it-proxy/blob/62fd619bd2bad542077e6a87498a47b00e05f91f/lib/proxy.js#L51

Hmm.. I didn't know. I guess this can be tested out then by updating the nginx rewrite accordingly. If so, I guess the only thing lacking is to update Galaxy to support path-based URLs in the dev setup (interactivetools_upstream_proxy: false), which shouldn't be much work, or is that also already there?

@hexylena
Copy link
Member

I think nginx (or another proxy) isn't a completely unreasonable requirement, especially if it only requires forwarding a handful of paths.

If so, I guess the only thing lacking is to update Galaxy to support path-based URLs in the dev setup

It should have been implemented here, but from my reading I'm not sure how that would be happening.
#8596 So I'm not sure where to go from there. But yes, if we could expose it as an env var, I think that would be enough, jsut like we do for API_KEY, if we had something similar for routing_key, it would be ideal (i.e. the path it should be accessible at.)

@sveinugu
Copy link
Contributor Author

I think nginx (or another proxy) isn't a completely unreasonable requirement, especially if it only requires forwarding a handful of paths.

It is definitely not unreasonable for production, but I don't think anyone will set up nginx on their local dev machine in order to test path-based ITs. Without a simpler dev setup, I don't believe path-based ITs will ever take over from domain-based as the default setup, which I believe they should.

@sveinugu
Copy link
Contributor Author

sveinugu commented Oct 10, 2022

But yes, if we could expose it as an env var, I think that would be enough, jsut like we do for API_KEY, if we had something similar for routing_key, it would be ideal (i.e. the path it should be accessible at.)

You mean to expose this to the Docker containers? Yes, I think that is definitely needed. Here is the current hack employed by our ProTo2 interactive tool to get the url:

def main():
    galaxy_url = os.getenv('GALAXY_URL')
    galaxy_api_key = os.getenv('API_KEY')
    galaxy_work = os.getenv('GALAXY_WORKING_DIR')
    galaxy_history_id = os.getenv('HISTORY_ID')
    galaxy_output = os.getenv('GALAXY_OUTPUT')

    gi = GalaxyInstance(url=galaxy_url, key=galaxy_api_key)
    print(get_proto2_url(gi, galaxy_history_id, galaxy_output))

and:

def get_proto2_url(gi, galaxy_history_id, galaxy_output):
    jobs = gi.jobs.get_jobs(
        state='running', tool_id='interactive_tool_proto2', history_id=galaxy_history_id)
    # print(jobs)
    for job in jobs:
        job_id = job['id']
        job_info = gi.jobs.show_job(job_id, full_details=True)
        job_cmd = str(job_info['command_line'])
        if job_cmd.find(galaxy_output) != -1:
            break

    eps = gi.make_get_request(gi.base_url + '/api/entry_points?job_id=' + job_id).json()

    target = str(eps[0]['target']).rstrip('/').replace('//', '/')
    return target

It works fine, but is definitely a workaround. Not sure if it works in all cases, and is definitely not pretty.

EDIT: This was actually a previous attempt, which is now commented out. We are running Flask which works out-of-the-box with app = Flask(__name__, instance_relative_config=True). This might not be true for all web-server setups though.

@hexylena
Copy link
Member

It is definitely not unreasonable for production, but I don't think anyone will set up nginx on their local dev machine in order to test path-based ITs. Without a simpler dev setup, I don't believe path-based ITs will ever take over from domain-based as the default setup, which I believe they should.

I've run nginx locally for a while, but I guess my primary use case is still "run path-based ITs in productions even on university machines" rather than running ITs locally which I've never had the desire to do, I don't mean to discount that part, I just cannot help there as much as it requires mucking about in gunicorn/routing to support weird things like websockets.

If I had path based ITs I'd switch to them in production over the domains. It's significantly less of a problem to have path based ones. In one of my jobs, the path based is the only option. In the other, the IT department would LOVE to get rid of that wildcard cert (which they inexplicably don't like.)

You mean to expose this to the Docker containers? Yes, I think that is definitely needed. Here is the current hack employed by our ProTo2 interactive tool to get the url:

I mean expose the URL at which the container should be accessed, e.g. /interactivetools/rstudio/<uuid>/, to the container. I think your solution to "which URL galaxy should be accessed" at is a separate thing (and we do the same in jupyter.)

@sveinugu
Copy link
Contributor Author

sveinugu commented Oct 10, 2022

I mean expose the URL at which the container should be accessed, e.g. /interactivetools/rstudio/<uuid>/, to the container. I think your solution to "which URL galaxy should be accessed" at is a separate thing (and we do the same in jupyter.)

No, I'm quite sure this is what the above code does (however it was found to be not needed in our case). Just tested now, and the target field of an entry_point of a job running ProTo2 interactive tool contained /interactivetool/access/interactivetoolentrypoint/<uuid1>/<uuid2>/ which I believe is the URL you are referring to.

@hexylena
Copy link
Member

Oh, fascinating. I did not get that from the code you shared, that's very useful to know. But yes, that needs to be available to containers and potentially set as an environment variable, and then we'd be able to make jupyter work directly with that, without any other setup.

@sveinugu
Copy link
Contributor Author

I've run nginx locally for a while, but I guess my primary use case is still "run path-based ITs in productions even on university machines" rather than running ITs locally which I've never had the desire to do, I don't mean to discount that part, I just cannot help there as much as it requires mucking about in gunicorn/routing to support weird things like websockets.

Ok, I should adjust my above statement to "I don't think anyone other than Galaxy admins will set up nginx on their local dev machine"! 😁

So we have developed our path-based IT mainly by running it on a cloud server and developing directly against that (using syncthing or ssh), but that is a cumbersome dev setup. Being able to run a path-based IT locally would be preferable for most developers. Also, for test setup, I believe having a solution for path-based ITs that does not depend on an nginx (or other) proxy is a necessity.

But isn't setting interactivetools_upstream_proxy: false already routing directly to the gi-it-proxy? Do you know how this works?

@sveinugu
Copy link
Contributor Author

sveinugu commented Dec 8, 2022

I have added the following PRs that together should fix path-based proxying through gx-it-proxy:

#15147
galaxyproject/gravity#96
galaxyproject/gx-it-proxy#14

Implemented together with @morj-uio

Still lacks a way to notify the container about the path. Also not properly tested yet, so the PRs should currently be considered Work In Progress.

@hexylena
Copy link
Member

@sveinugu this is fantastic :) Would you (or someone in your group) have bandwidth to write up:

  • what changes are needed container side, if any, for folks implementing ITs
  • what changes are needed admin side?

I'd love to update the training materials accordingly.

@sveinugu
Copy link
Contributor Author

@sveinugu this is fantastic :) Would you (or someone in your group) have bandwidth to write up:

  • what changes are needed container side, if any, for folks implementing ITs
  • what changes are needed admin side?

I'd love to update the training materials accordingly.

Have at least partly addressed this now in a new PR: #15227.

In order for this functionality to work, all the PRs must be merged and the versions of gx-it-proxy and gravity must be bumped (also in any requirements, which I failed to locate for gx-it-proxy).

Have tested from both local development and with nginx upstream proxy.

A way to notify the container about the path must wait to next release, as I am taking holidays (or someone else could give it a go).

@sveinugu
Copy link
Contributor Author

sveinugu commented Dec 17, 2022

Have tested from both local development and with nginx upstream proxy.

We have our own interactive tool that we have tested with. Have not tried with included tools, but according to either @hexylena or @mvdbeek (can't remember right now) Jupyter notebook and RStudio should work with path-based. Just set requires_domain=False in the tool and give it a go! Due to time limitations and peculiarities with our setup (based on podman), I have not been able to test with other tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants