-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different results between different implementations #12
Comments
Hi! I'm glad to hear that you like the speed of tsdownsample 🚀 (I even think there might stil bel some more room for improving the runtime speed of tsdownsample) Thank you for pointing out the discrepancy in the results between the LTTB implementation in tsdownsample and plotly_resampler. I apologize for the lack of documentation on this point. To clarify, the MinMaxLTTB algorithm is a new heuristic for the LTTB algorithm, which first performs MinMax downsampling on the data before running the more expensive LTTB algorithm on the reduced data. The MinMaxLTTB downsampler takes an optional keyword argument called Given that you have As an illustration, here is some modified code where import numpy as np
from plotly_resampler.aggregation.algorithms.lttb_py import LTTB_core_py
from tsdownsample import MinMaxLTTBDownsampler
np.random.seed(1)
ns = [i for i in np.arange(1, 100, 2) * 1000]
n_out = 1000
for n in ns:
x, y = np.arange(n), np.random.randn(n).cumsum()
py = LTTB_core_py.downsample(x, y, n_out)
ru = MinMaxLTTBDownsampler().downsample(x, y, n_out=n_out, minmax_ratio=60)
print(f"Matches with {n:>5}: {(py == ru).sum() / 10}%") The percentages above show the agreement of MinMaxLTTB with LTTB (which is also a local heuristic). I hope this helps to clarify the behavior you observed. I'm happy to hear what you think of the MinMaxLTTBDownsampler 👀 P.S.: To use the parallel version of the tsdownsample algorithms (you should pass the |
@hoxbro Would it be possible to share your benchmark code? Oddly I have also observed sub-linear scaling of your implementation in case of >10^6 data points: a 10x increase in the number of data points lead to only a 3x increase in time. Whereas my implementation scales linearly. |
Hi @muendlein, I really appreciate other people taking the time to benchmark the code! 😊 When reporting benchmark results that compare with other implementations, it is extremely important to make sure that we are comparing apples to apples;
Perhaps you (@muendlein) could also share your benchmarks (with your C & numba implementation) as well? Regarding your observations: My takeaways from your remarks
P.S. you are always welcome to contribute further to this library, pinpointing where the code is slower than other implementations is already a significant contribution 🚀 (P.P.S. I observed the sublinear scaling as well, but need more time confirm this with proper benchmarks) |
I have been playing around with your implementation of lttb and trying to compare it to other implementations (numpy and numba) and have been impressed by the speed.
But I have noticed that the algorithm does not give the same results as your other implementation in
plotly_resampler
. It seems to fail around 30,000 points. Do you know if this is expected?Looking at the above log-log plot, some extra overhead is added, around 30,000. So I'm thinking it could be because of some race condition when parallelizing the algorithm.
The text was updated successfully, but these errors were encountered: