-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix write optimization in update_many #81
Conversation
Wow! That actually improves batching rather well. Thanks! |
Would it make more sense to generate the len(alignedPoints) before the loop, so you aren't calling len() for every point? |
Good to hear it improves things. It is better just to call len() once, ive updated the pull request. |
I've written a test (gist here: https://gist.github.com/timob/4981bed6f2cc7f08d8a8), which runs whisper_update on current whisper and this fixed whisper with the same 100 metrics (one every minute) as arguments. Using whisper files with same creation options (1 minute intervals) with debug enabled. Here is the output:
A whisper-fetch on both the resulting files returns the same data in each interval bucket. Here as you can see current whisper does 100 writes, the fixed whisper does 1 write. |
It may be better to use the approach in #69 Once it is finalised and accepted it could be ported to master. |
Lovely, the output above refs all the other Whisper issues. Please remember to wrap code (or output) blocks in backticks to avoid this in the future. |
@steve-dave @SEJeff Would you mind reviewing #89, #81 and #69 and making an executive decision on which (if any) we should move forward with? |
@steve-dave @SEJeff Disregard previous request. I'm merging this one now for |
fix write optimization in update_many
Thanks @timob! ✨ 🍰 |
Backport pr #81 into 0.9.x
🆒 |
Hawt |
Current code at line 653 uses dict() to remove duplicates. This works, but it leaves the list of points not in interval order. Later runs of contiguous points are batched together into a packed string so they can be write() at once. But because the list is out of order, this batching work is defeated, leaving many batches with just on point in them.
This fix removes the dict() call and uses last value in run of points with duplicate intervals during the loop that creates the list of packed strings for batching.