Move to storing api usage records in elasticsearch #1337

mlandauer · 2019-01-24T01:47:31Z

The table is currently rather big and doesn't provide much value for older data but we don't necessarily want to throw it away. So, let's archive old data to s3 like we do with cuttlefish.

mlandauer · 2019-01-24T04:45:08Z

Or maybe an alternative is to store it all in a more appropriate database that can handle that type of data better. One possibility is using Logstash / ElasticSearch / Kibana. The disadvantage is extra stuff and extra complexity and more things to learn. The advantage might be getting visualisation of the data for free.

mlandauer · 2019-02-14T04:22:26Z

Seems there's an "out of the box" solution for application monitoring to elasticsearch with visualisation powered by kibana I think - https://www.elastic.co/guide/en/apm/get-started/current/overview.html

mlandauer · 2019-02-16T20:47:47Z

I've put together a prototype local setup of elasticsearch / logstash / kibana. Did an experiment with using logstash to migrate a few thousand records from the api_statistics table in mysql to elasticsearch. Then, using kibana it was surprisingly straightforward to visualise that data to get some really helpful insights. So, I feel quite confident that the approach of keeping the api access data in elasticsearch is a good way to go.

This is an example of the top 5 users of the api in a small time interval. Each colour is a different api user. The vertical access shows the number of api requests. I've intentionally left out the legend, showing the names of the api users.

However, I discovered that the jdbc input plugin for logstash (which you use for getting access to mysql data) has some serious limitations when it comes to large tables. Our table has something like 90 million records in it and the jdbc logstash input plugin just doesn't work well with tables of this size. Pagination is currently implemented pretty poorly. See for example logstash-plugins/logstash-input-jdbc#321 logstash-plugins/logstash-input-jdbc#307 logstash-plugins/logstash-input-jdbc#305.

This kind of blows me away as you would think that getting a large table of data from mysql into elasticsearch would be up there as one of the big use cases for first time users. Go figure!

So, I'm thinking now it might be more straightforward to just put new data from the app directly into elasticsearch (rather than writing to the mysql database) and think about migrating the older data later. Probably just write a rake task to do the job rather than farting around with logstash more.

mlandauer · 2019-02-20T20:01:21Z

This is now running in production recording new data. However, it's currently using an elasticsearch cluster from cloud.elastic.co in a 14 day trial. So, it's not yet to be relied on. Still to do:

Moving over the historical data from the mysql table
Figure out minimum elasticsearch cluster specs
Get permanent setup for production elasticsearch (See https://github.com/openaustralia/oaf-internal/issues/16)
Ensure we have good and proper backups of elasticsearch
Remove mysql api stats table once migrated

mlandauer · 2019-02-26T22:43:52Z

The only things left to do on this ticket are to test thoroughly that the external backups to s3 are working as expected and that snapshots are being automatically made each day. Also, we should wait until the production setup is completely finalised.

mlandauer · 2019-02-27T23:47:09Z

I just did a test restore of a snapshot from s3. I picked a single instance, restored it under a different name and had a look at the data in Kibana and it all seems to check out fine. So, looking good!

I also checked that there are a number of new snapshots since I last looked. These are the daily ones that are made automatically. So, that's all working too.

mlandauer · 2019-03-05T00:57:57Z

Still waiting on finalising https://github.com/openaustralia/oaf-internal/issues/16

mlandauer · 2019-07-03T01:50:47Z

This is all done now

mlandauer added the ready label Jan 24, 2019

mlandauer added in progress and removed ready labels Feb 14, 2019

jamezpolley assigned mlandauer Feb 15, 2019

mlandauer changed the title ~~auto archive old api statistics records~~ Move to storing api usage records in elasticsearch Feb 16, 2019

mlandauer closed this as completed Jul 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move to storing api usage records in elasticsearch #1337

Move to storing api usage records in elasticsearch #1337

mlandauer commented Jan 24, 2019

mlandauer commented Jan 24, 2019

mlandauer commented Feb 14, 2019

mlandauer commented Feb 16, 2019 •

edited

Loading

mlandauer commented Feb 20, 2019 •

edited

Loading

mlandauer commented Feb 26, 2019

mlandauer commented Feb 27, 2019

mlandauer commented Mar 5, 2019

mlandauer commented Jul 3, 2019

Move to storing api usage records in elasticsearch #1337

Move to storing api usage records in elasticsearch #1337

Comments

mlandauer commented Jan 24, 2019

mlandauer commented Jan 24, 2019

mlandauer commented Feb 14, 2019

mlandauer commented Feb 16, 2019 • edited Loading

mlandauer commented Feb 20, 2019 • edited Loading

mlandauer commented Feb 26, 2019

mlandauer commented Feb 27, 2019

mlandauer commented Mar 5, 2019

mlandauer commented Jul 3, 2019

mlandauer commented Feb 16, 2019 •

edited

Loading

mlandauer commented Feb 20, 2019 •

edited

Loading