Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RTD PR builds get stuck #8772

Closed
pllim opened this issue Dec 17, 2021 · 5 comments · Fixed by #8850
Closed

RTD PR builds get stuck #8772

pllim opened this issue Dec 17, 2021 · 5 comments · Fixed by #8850
Assignees

Comments

@pllim
Copy link
Contributor

pllim commented Dec 17, 2021

Details

Expected Result

Build to run after "concurrency limit" is gone.

Actual Result

The job got stuck at "triggered (Concurrency limit reached (2), retrying in 5 minutes.)" state for many hours. cc @astrojuanlu

IMG_20211217_193722

I tried to close and re-open astropy/astropy#12587 in the hope to get past that roadblock but the new job won't start because it is marked as "duplicate".

Hope you can help. Thanks!

@astrojuanlu
Copy link
Contributor

I'm away from keyboard now but indeed this looks like some sort of weird race condition to me. I recall that it has happened to other projects in the past.

@astrojuanlu
Copy link
Contributor

astrojuanlu commented Dec 18, 2021

After some time, the builds ended up passing:

IMG_20211218_080244

So, one could say that "everything worked as expected", but it is still not clear to me what was the second build that was supposedly running.

In any case, I believe the solution here would have been to cancel the running and queued builds and start again.

@astrojuanlu
Copy link
Contributor

Found the issue: #7660 (also opened by @pllim)

@humitos
Copy link
Member

humitos commented Dec 20, 2021

I think this problem is related to #4386.

We have a task that checked for blocked/invalid/ghost builds and cancel them. Those blocked/invalid/ghost builds are builds that for some reason failed but we weren't able to detect, so we think they are still "running". The task checks for "running" builds and kill those that have been running for a long long time --in the meanwhile, new builds may be blocked (because of concurrency limit) and that's why you get the ⚠️ icon in the build list. Once these ghost builds are killed, your quota is freed up and you are able to run a build again.

I tried to solve this problem by detecting those ghost builds earlier and kill them in #8269 but it requires more testing to be trustable.

@pllim
Copy link
Contributor Author

pllim commented Dec 20, 2021

If you think this problem is being tracked in other issues already, feel free to close. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants