-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a fast count of total measurements for all time #529
Comments
Labeling this as both API and ingest since ideally we'd have a count of all data
|
@matschaffer I'm not sure I understand. What kind of cache are you talking to? Fragment cache? |
Is there a good way to pre-compute fragment caches on a schedule? If so that might work. If not we could run a job to store it in a simple db record that just contains the most recently computed value. |
I checked the controller and view and didn't find any |
The value we’re looking for is a total of all radiation measurements (might be nice to have one for air too). Similar to what @robouden and @auspicacious dug up for @nokton a few weeks back. Nothing in the system computes this right now but we’d like it to. |
This might be a helpful reference. I think that this query would get the current measurement count for one month, but the performance is not sufficient to use live (which is why we need to cache):
I think that we should look for the places in the code where the |
I don't see the reason for caching plus it might be problematic with all these filters we have in place. It can be endless variations. @auspicacious here the link to the example you provided and loads pretty fast to me: https://api.safecast.org/en-US/measurements?captured_after=2019%2F11%2F01+19%3A56%3A15&captured_before=2019%2F12%2F01+19%3A56%3A15&commit=Filter&distance=&latitude=&locale=en-US&longitude=&since=&unit=cpm&until=&utf8=%E2%9C%93 Is there an example URL that timing that doesn't include |
Maybe worth noting that the description in this issue is from 2018, when that page contained a total count of all measurements. That’s the value we’d like to have without tanking the page. |
Sounds like nothing to optimize here if these counts no longer on a page, no? Shell we close the ticket? |
Or reword it. We want the counts back |
I've updated the title and wording above to hopefully clarify the ask. |
yeah. Doesn't have to be on that page necessarily, but it's where people are used to looking for it |
I think I found these pages: So yea, we can cache it |
We're pulling it on the front page of https://safecast.org/ now too without any speed issues, so maybe a cache already exists? @norcross might know. |
Not for the page above. I assume that |
@seanbonner yeah, would be nice to know how that number is getting created. |
According to https://api.safecast.org/en-US/measurements/count (143,634,213), it's off ~300k |
I just spoke to @norcross and he's about to crash but will chime in tomorrow - he's pulling from an API endpoint that someone here gave him that was supposed to be cached every 12 hours, so maybe that was recently broken as well. He'll clarify when he wakes up. |
Is there a repo for |
Deferring to @norcross again, but if I remember correctly he was planning to put the code up there once the site was launched (earlier this year) though that obviously hasn't happened yet. |
Since we've found this endpoint, we should probably look at the code for that before deciding what to do. I see two Also, you'll notice that in the SQL query I described above, I'm filtering by https://github.com/Safecast/safecastapi/blob/master/app/models/measurement.rb#L12 |
These
Yep, I included in my query captured_before=2019%2F12%2F01+19%3A56%3A15&commit=Filter&distance=&latitude=&locale=en-US&longitude=&since=& |
safecast.org is pulling the the number displayed on the home page from I'm caching that return inside of WP, not anywhere on the actual API. the site will make an API call and store it using the WP transient caching. That value will be stored for an hour, then on the next new page request it'll make a new API call to update itself. there is also a button to manually refresh the number on the admin side. there is also a WP-cron task (which is not a real cron task, it's a long story) that will update once a day if by chance none of the other updates are happening. as for the site code, yes, i need to merge the repo i used during development into this one. |
@norcross Would you check my reply: #529 (comment) |
ok. i’ve confirmed that when i manually refresh, i get the accurate number (it was off by 7 between when i refreshed in WP and manually loaded the .json page). so the outdated number displayed on the home page is due to some caching inside of WP that i'm going to fine tune. |
ok, i’ve changed how the WP side is refreshing the data. i’ve included some additional cache cleanup when the count is refreshed (manually or automatically) and i’ve also added a remote refresh request function that i’m hitting with a real cron job every 30 mins (time can be adjusted if need be) so the site should have a relatively accurate count updating itself. |
Sweet, if we’re happy with that we can close this issue I think.
|
@matschaffer The only thing I would improve is |
Yep. There’s also a plan-based estimate method on
https://wiki.postgresql.org/wiki/Count_estimate that’s very interesting if
we wanted to return just a cpm count.
I wouldn’t rate either as high priority since WP is caching for us though.
But happy to have the contribution if it strikes your fancy.
|
as far as my side is concerned, everything is now working as expected. if there's any changes I need to make making the request, we can open a new issue. |
@matschaffer I just added caching: #678 |
We used to have this on https://api.safecast.org/en-US/measurements, but it would cause timeouts since counting the entire measurements table takes many minutes.
We'd like to get that number back and available without killing performance. Probably via pre-computation to a cached record (e.g., in the DB).
Ideally this should be a count of all radiation measurements we've collected during the safecast project:
We should also consider how to make this data available for other measurement types (e.g., air quality data).
One thing to watch out for here is that ttserve has been dual publishing to both ingest & api a few years now. So we should be careful we're not doubling the final count.
The text was updated successfully, but these errors were encountered: