Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple task to finish inactive builds #3312

Merged
merged 4 commits into from
Dec 14, 2017
Merged

Conversation

humitos
Copy link
Member

@humitos humitos commented Nov 23, 2017

This is an initial idea to solve these kind of issues and avoid user confusions:

  • I found that we are not running celery beat for periodic tasks so we need to define how this will be executed. Maybe it could be done as a django command and ran it by cron (depending how is the current setup at RTD servers)
  • We should consider the custom time limit for some project, since if it's more than 45 minutes this task will mark them as finished
  • Define how we want to show this custom error

Here is an example on how this will look like

captura de pantalla_2017-11-23_17-45-22

@humitos humitos added the PR: work in progress Pull request is not ready for full review label Nov 23, 2017
@humitos humitos requested a review from agjohnson November 23, 2017 22:46
@humitos humitos added the Needed: design decision A core team decision is required label Nov 23, 2017
@agjohnson
Copy link
Contributor

Ah yes, this could be updated in our docs probably.

We run celery beat just as we run celery:

celery -A readthedocs worker -B

So this would be another command to run for local development. You want to run on a separate worker, as it should only have 1 concurrent process. That is, this will result in multiple scheduled tasks, as there is 4 concurrency:

celery -A readthedocs worker -Q default,celery,web,builder -B

I'm fine not supporting this when running celery as ALWAYS_EAGER -- maybe throwing a warning through a Django system check?

Custom time limit should be 150-200% of the project build time. The project build time is the container_time_limit or the default docker time limit. The celery task time limit is 120% of the project build time, I believe. So 150-200% should extend past the timeout on celery and docker.

Also, thinking more on this, it might be helpful to include more information in the error message. Adding something similar to our general error reporting would help -- "If you continue to encounter this error, file a support request with and reference this build id (#)"

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is in the right direction. I added some feedback on the questions you've had so far, so we'll probably need another review when those changes are added.

for build in Build.objects.filter(query):
build.success = False
build.state = BUILD_STATE_FINISHED
build.error = 'This build was terminated due to inactivity.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my PR, I'm accumulating these error messages in one spot, which might make sense. I'm storing these error messages all in a central projects.exceptions file. This could be implemented as an exception, but I'm not sure if the exception will be raised otherwise -- we could combine the message for this and the timeout kill though.

query = (~Q(state=BUILD_STATE_FINISHED) &
Q(date__lte=datetime.datetime.now() - datetime.timedelta(minutes=45)))
# TODO: consider ``poject.container_time_limit`` since it could be bigger than 45 minutes
for build in Build.objects.filter(query):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be build.update() instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to use update we can't put the build id # in the message.

So, for now, I'm adding this message:

        error=('This build was terminated due to inactivity. If you continue '
               'to encounter this error, file a support request.')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think I will limit the query to the first 25 or 50 elements because otherwise the first time this task will be ran it will take tons of existing builds from all the times...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, this is not valid...

AssertionError: Cannot update a query once a slice has been taken.

So, maybe go back to my approach is better.

@agjohnson agjohnson removed the Needed: design decision A core team decision is required label Nov 27, 2017
@safwanrahman
Copy link
Member

I think we should use celery beat for this. otherwise this task does not have any benifit

@agjohnson
Copy link
Contributor

Celery beat is going to be used. The task above is will be a scheduled task

@humitos
Copy link
Member Author

humitos commented Dec 5, 2017

I just update this PR with:

  • do not use update since I want an slice of the filter and also we want to the build.pk inside of the message
  • 1:30hs as a grace time to the build to finish (I think it's enough) --I tried to make use of the project__container_time_limit but it was complicated and I wasn't able to end up with the correct query
  • add a log.info with the information about how many builds were marked as "Terminated..."

Let me know if this is OK for you and also how to configure this as an scheduled task since I don't find the CELERY BEAT setting.

@humitos
Copy link
Member Author

humitos commented Dec 5, 2017

We can always improve this task but it should be enough (for some definition) at this moment. It could be a good starting point.

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this method is close. Perhaps my suggestion is the best compromise between complexity and the most correct UX.


@app.task()
def finish_inactive_builds():
delta = datetime.timedelta(hours=1, minutes=30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use a timedelta of our base task timeout. For an example of how we add overhead to this, see:
https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/core/utils/__init__.py#L118-L128

There's a second part to this though, see below.


builds = Build.objects.filter(query)[:50]
for build in builds:
build.success = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check for the timeout date here, ie - if build.date + build.project.container_time_limit > now: continue.

So the query will surface extra builds, but the custom timeout builds will be an exception to this case. They keep running through this loop until the condition above is true.

Does this seem reasonable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think this is the easiest and more reasonable thing to do.

"I was overengeering it" as Eric said :)

'This build was terminated due to inactivity. If you '
'continue to encounter this error, file a support '
'request with and reference this build id ({0}).'.format(build.pk)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be translated, though I'm not sure if we're equipped to actually translate this yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equipped? In what sense?

I don't understand how/where the translation are working yet.

@humitos humitos added PR: ready for review and removed PR: work in progress Pull request is not ready for full review labels Dec 7, 2017
@humitos
Copy link
Member Author

humitos commented Dec 7, 2017

This PR should be ready.

Do I need to do something in particular to enable this task on celery beat?

@humitos humitos force-pushed the humitos/inactive-build-cleanup branch from c15d774 to a014493 Compare December 7, 2017 17:36
Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I think we'll want at least 3 test cases out of this:

  • Legitimate build isn't marked finished early
  • Build is custom time limit isn't marked finished early
  • Build without custom time limit is marked finished


if build.project.container_time_limit:
custom_delta = datetime.timedelta(
minutes=int(build.project.container_time_limit))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container_time_limit is in seconds, so delta will need to be changed here

@humitos
Copy link
Member Author

humitos commented Dec 12, 2017

I added a test case for this escenario. I thought it was better to have 3 builds at the same time instead of one build per test. Let me know if you agree with me or want me to split them.

I think it's easier to understand and also it's more close to the reality.

@agjohnson
Copy link
Contributor

Giddyup. The tests look great, I don't think it's important to split them up at all

@agjohnson agjohnson merged commit a700bbd into master Dec 14, 2017
@agjohnson agjohnson deleted the humitos/inactive-build-cleanup branch December 14, 2017 20:59
@humitos
Copy link
Member Author

humitos commented Dec 14, 2017 via email

@humitos
Copy link
Member Author

humitos commented Dec 23, 2017

@agjohnson @ericholscher what do we need to do to enable this task to be ran by celery beat?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants