Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: data storage metrics in AWS is missing S3 storage #2840

Closed
sanderegg opened this issue Feb 17, 2022 · 7 comments
Closed

metrics: data storage metrics in AWS is missing S3 storage #2840

sanderegg opened this issue Feb 17, 2022 · 7 comments
Assignees

Comments

@sanderegg
Copy link
Member

In AWS, we collect metrics over the usage of the host disks

BUT:

  • user data is in S3, which is not located on the hosts
  • image data is in S3, same as above

therefore the metrics for NIH should be adapted to show these

@elisabettai
Copy link
Collaborator

elisabettai commented Oct 25, 2022

Would it be possible to update the metrics to include the user and image data in S3? @sanderegg, does this require change in simcore code?

I've quickly checked with AWS CLI: it is rather easy to get the buckets size in a given point in time, but I don't think we can get a time series. And to answer to the required metrics "Overall data consumption (in GB/TB) by oSPARC over the quarter reported on" I understand that we need a single value (sum) for the last 3 months.

Or, we report the current value, let's say end of Nov. and then the next quarter we can take the difference... 😄

@elisabettai
Copy link
Collaborator

I am testing on my local machine how to get S3 storage usage (AWS prod), using s3cmd du -H as explained here

It that works, I was thinking to have a bash script that runs every day and save the date and the bucket size in a file, so we can get a time series.

@Surfict, @mrnicegyu11, any thought about this approach? Should we have this bash script living somewhere else than my local machine?
For user data I am using the bucket prodXXX-simXXX and for the image data I guess I should use registry.XXX.XX

@mrnicegyu11
Copy link
Member

mrnicegyu11 commented Oct 26, 2022

Thanks a lot for taking initiative here, superb! :--)

Here are my personal 2 cents on the topic:
I guess one could also use third-party tools, prometheus usually is set up to talk to so-called "exporters" to fetch data from non-standard sources. This would be the "canonical" approach to the best of my knowledge.
A google search for "s3 prometheus exporter" revealed for example this https://github.com/ribbybibby/s3_exporter
What your are suggesting is of course also possible and I guess we could call this "coding our own exporter". Depending on how well maintained the third-party s3-prometheus exporters are, and how straight forward to integrate, etc., it could be a viable approach as well.

Would you want me (or us together potentially) to have a look at pre-existing s3 exporters we might integrate into the stack, or look at what is necessary to have your script inside our stack? :) Let us know ;)

Maybe @Surfict also has some creative ideas :)

@sanderegg
Copy link
Member Author

sanderegg commented Oct 26, 2022

@elisabettai , @mrnicegyu11 , @Surfict : what about this service here: https://grafana.com/docs/grafana/v9.0/datasources/aws-cloudwatch/

seems also to be integrated in grafana, but I'm not sure if we can use it.

Here is the related dashboard: https://grafana.com/grafana/dashboards/575-aws-s3/

@elisabettai
Copy link
Collaborator

Thanks @mrnicegyu11 and @sanderegg for the input!

I'm happy to abandon the AWS-CLI script approach and re-using either the s3 prometheus exporter or AWS cloudwatch. I would go for the easiest to set up and maintain. I assume (without proper research sustaining this) that AWS Cloudwatch is the easiest. And a good news, the free metrics should be sufficient for us.

@mrnicegyu11
Copy link
Member

Cloudwatch is likely easier if there is a straight-forward way to get it's data into either prometheus or grafana.

Caveat is that this will not work for non-aws deployments (dalco, master, tip), but I guess this is not even necessary at this point.

@elisabettai elisabettai self-assigned this Oct 27, 2022
@elisabettai
Copy link
Collaborator

This seemed (too) easy. 😄
I made a "NIH_Metrics" dashboard with CloudWatch on AWS prod

image

I'd say this answer the metrics. We're just looking at two buckets (see legend) to see user and image data.

Let me know @sanderegg, @mrnicegyu11 if you see something weird or if we should combine this info with what we already have in grafana (the usage of host disks).

Also these CloudWatch metrics should be free, as I understand from here, worth double-checking in our billing info it that's true (maybe @Surfict).

Maybe, for fun, I can have look to add this into our Grafana, but for the moment I'd say this guy does the job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants