Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FatalError during init, clarify error + how to start cluster? #346

Open
rhugonnet opened this issue Oct 13, 2023 · 2 comments
Open

FatalError during init, clarify error + how to start cluster? #346

rhugonnet opened this issue Oct 13, 2023 · 2 comments

Comments

@rhugonnet
Copy link
Contributor

Hi @jpswinski, @tsutterley,

After setting up an account with "uw", and following the guidelines in https://slideruleearth.io/web/rtd/user_guide/Private-Clusters.html#getting-started-with-private-clusters, I got the following FatalError:

Executing:

sliderule.init("slideruleearth.io", organization="uw")

I get:

Connection error to endpoint https://uw.slideruleearth.io/source/version ...retrying request
Connection error to endpoint https://uw.slideruleearth.io/source/version ...retrying request
Connection error to endpoint https://uw.slideruleearth.io/source/version ...retrying request
Traceback (most recent call last):
  File "/home/atom/miniconda3/envs/srtm_pene/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-5305c5113031>", line 11, in <module>
    sliderule.init("slideruleearth.io", organization="uw")
  File "/home/atom/miniconda3/envs/srtm_pene/lib/python3.10/site-packages/sliderule/sliderule.py", line 678, in init
    return check_version(plugins=plugins) # verify compatibility between client and server versions
  File "/home/atom/miniconda3/envs/srtm_pene/lib/python3.10/site-packages/sliderule/sliderule.py", line 1172, in check_version
    info = get_version()
  File "/home/atom/miniconda3/envs/srtm_pene/lib/python3.10/site-packages/sliderule/sliderule.py", line 1149, in get_version
    rsps = source("version", {})
  File "/home/atom/miniconda3/envs/srtm_pene/lib/python3.10/site-packages/sliderule/sliderule.py", line 785, in source
    raise FatalError("Unable to complete request due to errors")
sliderule.sliderule.FatalError: Unable to complete request due to errors

I took me a bit of time to figure out this might be from the fact that the cluster is not deployed:

Cluster State
    uw is NOT deployed

Following https://slideruleearth.io/web/rtd/user_guide/Private-Clusters.html#starting-and-scaling-a-private-cluster, I also didn't know how long exactly for the cluster to start after using sliderule.update_available_servers (or that I would have to use that call at the very beginning of the script).

Now, I still get version errors:

RuntimeError: Client (version (4, 0, 2)) is incompatible with the server (version (3, 7, 0))

Maybe we could clarify these three aspects in SlideRule:

  • Reproducible example in the Doc with the lines in order to start a cluster + an estimate of the time it takes to start (10s, 1min, 10min?),
  • Return a better message to orient the user than the current FatalError that is not very helpful,
  • For versioning, I still don't know how to address it! I let you help me!
@jpswinski
Copy link
Member

jpswinski commented Oct 16, 2023

@rhugonnet - Thank you for going through this process and giving us this feedback!

In the short term

  • I've updated the private cluster documentation to hopefully be clearer and provide better examples and troubleshooting information. You can find the updated page at: https://slideruleearth.io/web/rtd/user_guide/Private-Clusters.html

  • I've also gone ahead and updated the uw cluster to version 4. One feature of private clusters is that they can be pinned to a specific version, or to a major version, which is really helpful if you are in the middle of developing code for a specific processing run. But in this case, it was pinned to major version 3, which means it was automatically getting any updates that were a part of the version 3 releases, but didn't automatically go to version 4. @tsutterley - please let me know if it is okay that I bumped you guys to version 4. I am assuming it is what you want, but just want to check.

In the long term

  • I think we should add a "latest_release" version or something like that, which can be specified for a private cluster and tells it to always grab the next release even if it crosses a major version boundary. See Provide "latest release" version option in provisioning system #347 for the issue created for this.

  • We should add a call in the client to the provisioning system to check whether a cluster is deployed or not. This call can only be made when a request to the cluster fails, so it shouldn't affect performance, but could give vital information to the user on what they should do. See Provide cluster state on failed request #348 for the issue created for this.

@tsutterley
Copy link
Contributor

@jpswinski bumping the uw cluster to v4 works on our end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants