Increase memory limits for build projects (autoscale workers) #4403

humitos · 2018-07-18T16:26:35Z

We have several projects lately that need more memory than the default value (1g). Although, since memory is a resource that could cause other different problems than CPU time we have been very careful when increasing this limit in some projects.

At this time we do have only 6 projects with different limits (1500m) than the default and only 2 of them with the maximum limit used (2g).

When a project needs more memory resources I usually suggest the owner to:

disable formats being built
reduce the amount of dependencies installed (via pip or conda)

These two points can be found in our docs https://docs.readthedocs.io/en/latest/guides/build-using-too-many-resources.html

To check the CPU and memory consumption of the processes executed by Read the Docs, there is an awesome comment in this issue aiidateam/aiida-core#1472 (comment) (it uses psrecord to make some nice plots)

This issue, in particular, is to collect projects that are running out of memory when building, increase their limits and track the results. Besides, to discuss a long term solution where increasing memory limits doesn't affect the builder servers.

Projects that are currently hitting the memory limit which I will start by increasing their limits:

https://readthedocs.org/projects/openmw/ -> 2g
- This project needs some work from the author since the C++ autodoc code is disabled (OpenMW documentation builds failed, OOM #3014 (comment))
https://readthedocs.org/projects/geopandas/ -> 2g
- I tested this by increasing it to 2g and 3g and it still doesn't build. The error say something different than "Killed due memory consumption" but Sentry reports don't show anything relevant.
- I removed the custom mem limit finally since it didn't solve the issue
- Edit: I was able to build this project locally by setting 4gb of memory. Although, we probably can't do this in production since our builders are 4gb in total 😕

Also to discuss what are the steps to follow from the core team to be able to increase these times in a safe way (without creating another issue in the builders) and propose a solution around it.

Ideas for a solution

Use Celery autoscale

We have talk about using Celery autoscaling option (http://docs.celeryproject.org/en/latest/userguide/workers.html#autoscaling) but instead of make celery decide when and how to increase/decrease the amount of workers, we may want to define our own Autoscaler and define it in the setting (http://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-worker_autoscaler)

Example of Autoscaling based on CPU and Memory: https://gist.github.com/speedplane/224eb551c51a74068011f4d776237513

Scale workers manually

Another idea we had in mind was to scale the workers depending on the values that we already knew: container_time_limit and container_mem_limit. So, before the trigger_build function is called we can decrease the workers if it will be a task that consumes too much memory.

Increasing the workers at that point is not possible because we don't have information about the kind of tasks that the current builder is running. If we save the task_id into the Build object we could ask for all the tasks the builder is running, map it with the build object and know the project being built with the amount of resources needed.

Another possibility instead of saving the task_id in the Build object could be to create a Celery chain that first decrease the workers to 1, then execute the build, and then increase the workers to the default value.

Use a specific queue for heavy mem usage projects

To avoid all this logic, we could have a builder with just only one worker. Before trigger_build is called, the web server checks for custom time limits and force the task to be added in this particular queue.

The text was updated successfully, but these errors were encountered:

davidfischer · 2018-08-13T20:49:57Z

In my limited testing, pypy had much better build performance from a time perspective. See #3870.

humitos · 2018-08-30T10:43:37Z

News no this: we implementing this idea:

Use a specific queue for heavy mem usage projects

Although, the projects are manually added to that queue and manually increased the resources available to build.

Before trigger_build is called, the web server checks for custom time limits and force the task to be added in this particular queue

Making this process automatic, is part of #4573 and should be discussed there.

I'm closing this issue here since there is no action to take for now. In case we want to change our current implementation, we can consider to reopen it.

humitos · 2018-09-11T11:11:12Z

I just want to add that the solution implemented of the build03 queue has been working properly and we haven't had more reports of projects with failing builds due to timeout or memory.

Use Celery autoscale

There is an issue that talks about this possible solution that worth to link it here: #3990

humitos added Improvement Minor improvement to code Needed: design decision A core team decision is required Needed: documentation Documentation is required labels Jul 18, 2018

This was referenced Jul 26, 2018

Consider increasing the 900s time limit for projects using conda #4432

Closed

Unexpected build failure #4449

Closed

humitos added this to the Build stability milestone Jul 30, 2018

bdice mentioned this issue Aug 2, 2018

Builds killed due to memory usage #4460

Closed

humitos mentioned this issue Aug 27, 2018

Apply failing projects for more resources #4573

Closed

humitos closed this as completed Aug 30, 2018

erwanp mentioned this issue Jan 3, 2019

cache Travis/RTD conda install radis/radis#27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase memory limits for build projects (autoscale workers) #4403

Increase memory limits for build projects (autoscale workers) #4403

humitos commented Jul 18, 2018 •

edited

Loading

davidfischer commented Aug 13, 2018

humitos commented Aug 30, 2018

humitos commented Sep 11, 2018

Increase memory limits for build projects (autoscale workers) #4403

Increase memory limits for build projects (autoscale workers) #4403

Comments

humitos commented Jul 18, 2018 • edited Loading

Ideas for a solution

Use Celery autoscale

Scale workers manually

Use a specific queue for heavy mem usage projects

davidfischer commented Aug 13, 2018

humitos commented Aug 30, 2018

humitos commented Sep 11, 2018

humitos commented Jul 18, 2018 •

edited

Loading