Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[No queue: nlp requested] error happening intermittently when running nlp queue with Stanford workers #4405

Open
AndrewJGaut opened this issue Feb 22, 2023 · 4 comments
Labels
p1 Do it in the next two weeks.

Comments

@AndrewJGaut
Copy link
Contributor

No description provided.

@AndrewJGaut AndrewJGaut added user-question p1 Do it in the next two weeks. and removed user-question labels Feb 22, 2023
@AndrewJGaut
Copy link
Contributor Author

I can't reproduce this anymore. I believe it's working correctly.

@AndrewJGaut
Copy link
Contributor Author

AndrewJGaut commented Mar 15, 2023

Reopening this as there is an issue that we've identified that occurs with bundles run on worksheets that are not publicly readable.

For instance, consider the following example, with the following steps:
image
Here's me executing those commands:
image

Now, the weird thing is, I can't reproduce this locally! When I do something similar (with private worksheet test-ws) I get the following:
image
We see that the staged bundle on the private worksheet does show up in the search output!
The only difference I've been able to identify thus far is that the staged_status fields differ. On my local, I don't get the `No queue (nlp requested) issue; see here:
image

Now, the reason this is causing the issue that we see from the worker-manager is that the worker-manager uses a search call (equivalent to that made by the client when running cl search) that precisely runs cl search .mine state=staged. Therefore, any bundles that don't show up in that output aren't picked up by the slurm worker-manager and so they remain in staged forever.

Note: we have verified that the slurm worker-manager and cl search work fine with bundles on public worksheets.

The question: why are staged bundles on private worksheets not being picked up by cl search on prod, even though they are when I run the same commands locally? I'm investigating further...

@AndrewJGaut
Copy link
Contributor Author

AndrewJGaut commented Mar 15, 2023

Moreover, I don't see anything amiss in the query or the database entry.

The query looks as follows:
image

Here's what the bundle that's staged in the private worksheet looks like in the prod database:
image

And here's what my user ID is:
cl info -f id
image

@AndrewJGaut
Copy link
Contributor Author

Try doing this as root on main instance and see if it works. If so, try creating a non-root account locally and then reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1 Do it in the next two weeks.
Projects
None yet
Development

No branches or pull requests

1 participant