Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to storing api usage records in elasticsearch #1337

Closed
mlandauer opened this issue Jan 24, 2019 · 8 comments
Closed

Move to storing api usage records in elasticsearch #1337

mlandauer opened this issue Jan 24, 2019 · 8 comments
Assignees

Comments

@mlandauer
Copy link
Member

The table is currently rather big and doesn't provide much value for older data but we don't necessarily want to throw it away. So, let's archive old data to s3 like we do with cuttlefish.

@mlandauer
Copy link
Member Author

Or maybe an alternative is to store it all in a more appropriate database that can handle that type of data better. One possibility is using Logstash / ElasticSearch / Kibana. The disadvantage is extra stuff and extra complexity and more things to learn. The advantage might be getting visualisation of the data for free.

@mlandauer
Copy link
Member Author

Seems there's an "out of the box" solution for application monitoring to elasticsearch with visualisation powered by kibana I think - https://www.elastic.co/guide/en/apm/get-started/current/overview.html

@mlandauer
Copy link
Member Author

mlandauer commented Feb 16, 2019

I've put together a prototype local setup of elasticsearch / logstash / kibana. Did an experiment with using logstash to migrate a few thousand records from the api_statistics table in mysql to elasticsearch. Then, using kibana it was surprisingly straightforward to visualise that data to get some really helpful insights. So, I feel quite confident that the approach of keeping the api access data in elasticsearch is a good way to go.

image

This is an example of the top 5 users of the api in a small time interval. Each colour is a different api user. The vertical access shows the number of api requests. I've intentionally left out the legend, showing the names of the api users.

However, I discovered that the jdbc input plugin for logstash (which you use for getting access to mysql data) has some serious limitations when it comes to large tables. Our table has something like 90 million records in it and the jdbc logstash input plugin just doesn't work well with tables of this size. Pagination is currently implemented pretty poorly. See for example logstash-plugins/logstash-input-jdbc#321 logstash-plugins/logstash-input-jdbc#307 logstash-plugins/logstash-input-jdbc#305.

This kind of blows me away as you would think that getting a large table of data from mysql into elasticsearch would be up there as one of the big use cases for first time users. Go figure!

So, I'm thinking now it might be more straightforward to just put new data from the app directly into elasticsearch (rather than writing to the mysql database) and think about migrating the older data later. Probably just write a rake task to do the job rather than farting around with logstash more.

@mlandauer mlandauer changed the title auto archive old api statistics records Move to storing api usage records in elasticsearch Feb 16, 2019
@mlandauer
Copy link
Member Author

mlandauer commented Feb 20, 2019

This is now running in production recording new data. However, it's currently using an elasticsearch cluster from cloud.elastic.co in a 14 day trial. So, it's not yet to be relied on. Still to do:

  • Moving over the historical data from the mysql table
  • Figure out minimum elasticsearch cluster specs
  • Get permanent setup for production elasticsearch (See https://github.com/openaustralia/oaf-internal/issues/16)
  • Ensure we have good and proper backups of elasticsearch
  • Remove mysql api stats table once migrated

@mlandauer
Copy link
Member Author

The only things left to do on this ticket are to test thoroughly that the external backups to s3 are working as expected and that snapshots are being automatically made each day. Also, we should wait until the production setup is completely finalised.

@mlandauer
Copy link
Member Author

I just did a test restore of a snapshot from s3. I picked a single instance, restored it under a different name and had a look at the data in Kibana and it all seems to check out fine. So, looking good!

I also checked that there are a number of new snapshots since I last looked. These are the daily ones that are made automatically. So, that's all working too.

@mlandauer
Copy link
Member Author

Still waiting on finalising https://github.com/openaustralia/oaf-internal/issues/16

@mlandauer
Copy link
Member Author

This is all done now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant