-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telemetry: prune the amount of data stored #9328
Comments
Yea, I think the big question I have is "what valuable queries are we running against this data?" I think the answer currently is "we aren't really using this data yet", but I think knowing how we think it will be useful is important in making sure we can only keep the data we care about. Alternatively, we could archive it somewhere that isn't stored in postgres (eg. a monthly We do this with our ads data, but we almost never go back and query it, so I'm not sure how useful it is to have archived old data, we can probably just delete it. The ads data we do this because it's billing data, but BuildData is not as important. |
These are two real cases where I used it and it was useful:
I don't think it is worth the effort because I don't think we will come back to pretty old data. The data we want to query to make decisions shouldn't be too old. |
As a first step, to avoid it growing too much, I'd save X months of data. Then, we probably want to save more data only for "active projects" --which are the ones that we care about the most. As a reference, ~90 days of data is ~8Gb:
Quick math:
|
Sounds like we should probably just keep the last 90 days for now at a minimum? |
ha ha! I don't how my math works... "90 days is 8Gb and 6 months is 30Gb" 🤔
Yes. I'd start with 180 days --which should be ~16Gb and I think it's acceptable. We can keep tuning it later as we start using this data more. |
Define a task to delete old `BuildData` older than `RTD_TELEMETRY_DATA_RETENTION_DAYS`, which is set to 180 days for now. This task is configured to be run every day at 2AM. Related #9328
When we implemented
telemetry
database to saveBuildData
objects we didn't implement the prune of it. After some days/weeks, we experimented with growth of 3Gb in data.We should prune the
BuildData
objects with some useful logic. We talked about pruning based on:BuildData
if spam score is less than 150)Each of them has its own downsides and we can talk/discuss a good implementation. It would be good if we can keep it simple and store only the data we need for the answers we are looking for.
Worth to note that we created this issue because a PagerDuty alarm was triggered due to the lack of extra free space in the database.
The text was updated successfully, but these errors were encountered: