Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 raising agent limit to 1G #3972

Merged
merged 2 commits into from
Mar 16, 2023

Conversation

GitHK
Copy link
Contributor

@GitHK GitHK commented Mar 15, 2023

What do these changes do?

Agent requires slightly more RAM to properly work when rlcone is syncing. Setting the limit to a save 1GB to avoid future issues.

Related issue/s

How to test

Checklist

@GitHK GitHK self-assigned this Mar 15, 2023
@GitHK GitHK added this to the Mithril milestone Mar 15, 2023
@GitHK GitHK marked this pull request as ready for review March 15, 2023 07:07
@codecov
Copy link

codecov bot commented Mar 15, 2023

Codecov Report

Merging #3972 (6779da1) into master (903b05d) will increase coverage by 4.2%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #3972      +/-   ##
=========================================
+ Coverage    80.7%   85.0%    +4.2%     
=========================================
  Files         464     943     +479     
  Lines       22155   40673   +18518     
  Branches      137     848     +711     
=========================================
+ Hits        17895   34596   +16701     
- Misses       4211    5854    +1643     
- Partials       49     223     +174     
Flag Coverage Δ
integrationtests 66.7% <ø> (+4.8%) ⬆️
unittests 81.9% <ø> (+4.4%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 569 files with indirect coverage changes

@GitHK GitHK requested review from pcrespov and mguidon March 16, 2023 05:58
@codeclimate
Copy link

codeclimate bot commented Mar 16, 2023

Code Climate has analyzed commit 6779da1 and detected 0 issues on this pull request.

View more on Code Climate.

@sonarcloud
Copy link

sonarcloud bot commented Mar 16, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.1% 0.1% Duplication

@GitHK GitHK requested a review from sanderegg March 16, 2023 07:14
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What did you notice in the agent to take the decision to increase its resources?How did you decided for 1G? (e.g. official rsync specs, you tested it ... ) This service is just cleaning up volumes of dyamic services, right?

@GitHK
Copy link
Contributor Author

GitHK commented Mar 16, 2023

Q: What did you notice in the agent to take the decision to increase its resources? How did you decide you needed 1G? This service is just cleaning up volumes right?

@sanderegg @pcrespov
I have set rclone to run with very low memory consumption. When backing up files which are very big, its clunkers still require some memory. For now I've seen it spike to 300MB of ram. The previous limit was not enough.
I am setting this soft limit to where rclone is allowed to jump for very brief periods of time to 1GB to avoid having to bump it by 100MB each time.

This service has no reservations on purpose not to take any resources form other services which need to run on the node, but has a maximum allowed memory usage, which is now set to 1Gb. Normally this service uses ~50MB or RAM.

@sanderegg
Copy link
Member

Q: What did you notice in the agent to take the decision to increase its resources? How did you decide you needed 1G? This service is just cleaning up volumes right?

@sanderegg @pcrespov I have set rclone to run with very low memory consumption. When backing up files which are very big, its clunkers still require some memory. For now I've seen it spike to 300MB of ram. The previous limit was not enough. I am setting this soft limit to where rclone is allowed to jump for very brief periods of time to 1GB to avoid having to bump it by 100MB each time.

This service has no reservations on purpose not to take any resources form other services which need to run on the node, but has a maximum allowed memory usage, which is now set to 1Gb. Normally this service uses ~50MB or RAM.

I understand that this service is supposed to run after the user service (e.g. s4l) ran. but what if there is a new service started there, that is supposed to get all the remaining RAM, and now the agent runs? This whole resources problem must be reviewed because we cannot just use resources and hope for the best...

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for the explanation
Now the decision is also documented.

@GitHK GitHK enabled auto-merge (squash) March 16, 2023 09:08
@GitHK GitHK merged commit 0687134 into ITISFoundation:master Mar 16, 2023
@GitHK GitHK deleted the pr-osparc-bump-agent-limit branch March 16, 2023 09:08
@GitHK
Copy link
Contributor Author

GitHK commented Mar 16, 2023

Q: What did you notice in the agent to take the decision to increase its resources? How did you decide you needed 1G? This service is just cleaning up volumes right?

@sanderegg @pcrespov I have set rclone to run with very low memory consumption. When backing up files which are very big, its clunkers still require some memory. For now I've seen it spike to 300MB of ram. The previous limit was not enough. I am setting this soft limit to where rclone is allowed to jump for very brief periods of time to 1GB to avoid having to bump it by 100MB each time.
This service has no reservations on purpose not to take any resources form other services which need to run on the node, but has a maximum allowed memory usage, which is now set to 1Gb. Normally this service uses ~50MB or RAM.

I understand that this service is supposed to run after the user service (e.g. s4l) ran. but what if there is a new service started there, that is supposed to get all the remaining RAM, and now the agent runs? This whole resources problem must be reviewed because we cannot just use resources and hope for the best...

Ideally we would leave 1GB of margin on all machines that such services like the agent or docker logger can use them. Otherwise we can enforce that all the services we deploy from OPS and simcore do not go over a shared amount of RAM and all have limited resources. This way the the stack starts the resources are set in stone and when services are scheduled no surprises will occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:agent agent service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants