-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to storing api usage records in elasticsearch #1337
Comments
Or maybe an alternative is to store it all in a more appropriate database that can handle that type of data better. One possibility is using Logstash / ElasticSearch / Kibana. The disadvantage is extra stuff and extra complexity and more things to learn. The advantage might be getting visualisation of the data for free. |
Seems there's an "out of the box" solution for application monitoring to elasticsearch with visualisation powered by kibana I think - https://www.elastic.co/guide/en/apm/get-started/current/overview.html |
I've put together a prototype local setup of elasticsearch / logstash / kibana. Did an experiment with using logstash to migrate a few thousand records from the api_statistics table in mysql to elasticsearch. Then, using kibana it was surprisingly straightforward to visualise that data to get some really helpful insights. So, I feel quite confident that the approach of keeping the api access data in elasticsearch is a good way to go. This is an example of the top 5 users of the api in a small time interval. Each colour is a different api user. The vertical access shows the number of api requests. I've intentionally left out the legend, showing the names of the api users. However, I discovered that the jdbc input plugin for logstash (which you use for getting access to mysql data) has some serious limitations when it comes to large tables. Our table has something like 90 million records in it and the jdbc logstash input plugin just doesn't work well with tables of this size. Pagination is currently implemented pretty poorly. See for example logstash-plugins/logstash-input-jdbc#321 logstash-plugins/logstash-input-jdbc#307 logstash-plugins/logstash-input-jdbc#305. This kind of blows me away as you would think that getting a large table of data from mysql into elasticsearch would be up there as one of the big use cases for first time users. Go figure! So, I'm thinking now it might be more straightforward to just put new data from the app directly into elasticsearch (rather than writing to the mysql database) and think about migrating the older data later. Probably just write a rake task to do the job rather than farting around with logstash more. |
This is now running in production recording new data. However, it's currently using an elasticsearch cluster from cloud.elastic.co in a 14 day trial. So, it's not yet to be relied on. Still to do:
|
The only things left to do on this ticket are to test thoroughly that the external backups to s3 are working as expected and that snapshots are being automatically made each day. Also, we should wait until the production setup is completely finalised. |
I just did a test restore of a snapshot from s3. I picked a single instance, restored it under a different name and had a look at the data in Kibana and it all seems to check out fine. So, looking good! I also checked that there are a number of new snapshots since I last looked. These are the daily ones that are made automatically. So, that's all working too. |
Still waiting on finalising https://github.com/openaustralia/oaf-internal/issues/16 |
This is all done now |
The table is currently rather big and doesn't provide much value for older data but we don't necessarily want to throw it away. So, let's archive old data to s3 like we do with cuttlefish.
The text was updated successfully, but these errors were encountered: