-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow performance; what am I missing? #62
Comments
These results are anomalous. In my own tests, I add a million or so points to a t-digest and see One question I have is whether you are initializing the latencies to Have you tried this experiment with the MergingDigest? To do this, use the one line:
On Tue, Dec 15, 2015 at 12:19 PM, akalani [email protected] wrote:
|
I initialized the latencies by randomizing and duplicating a sample set of Google ping latencies. The values are in milliseconds; the max (the timeout value) in the set is 100,000, though the max values are outliers and occur infrequently. Could the outlier timeout values be causing performance degradation? I have built the TDigest using the MergingDigest API, but see no improvement in performance. |
The outliers don't cause any problems at all. Nor does the actual distribution cause any issues except possible when there are massive numbers of duplicates. The MergingDigest should not have any distributional issues. Can you share your code? Your data? |
Here is the test that I wrote: The Java class takes the path to the latencies file as an input. Here is an output that it generated for the latencies dataset with 1 million records (time is in ms, and memory size is in KB): TDigest: time = 1218 The data file is attached. |
A minor correction, the memory size reported above is in bytes, not kilobytes. |
OK. I find a few issues.
Here is my adapted code:
After this, I get results like this on my laptop (note that these are with compression = 200):
This means that the t-digest is showing at about 4x slower than sorting the array and the cost per element is about 300 ns which isn't far from the result that I had from micro-benchmarking. I think that there is about another 2-4x speedup available in the code by simple improvements, but I don't have time to do it so these times will have to stand. It makes some sense that a raw sort of a million items would be very fast since the merging digest has to touch the data more often. If there are k elements in the insertion buffer and n in the retained buffer for the merging digest, the sorting time will be N log k versus N log N (with similar constants) for the entire sort where k = 32, log k = 5, and N = 10^6, log N = 20. The insertion time will require a pass over the retention buffer for each sort of the insertion buffer and thus will require n N/k = 6 N touches. With better optimization on the part of the java library writers, this isn't unreasonable. |
Thanks very much for your help. I will keep the above in mind. |
I just did some experiments (see the t-digests-benchmark repo). I think that your observations lead some good speedups. I experimented with allocating bigger insertion buffers and found that due to the phenomenon that you noted (built-in sort is fast), we can make some big improvements. Here are the results:
Mean is the time you want to look at. Factor determines how much more space I am allocating. Allocating 100x more space makes a whacking big difference. I will look at this some more and decide if that is a reasonable thing to do. If so, good news for speed, especially for higher compression factors. |
I have been playing with the t-digest on a sample dataset of latencies and comparing performance against raw sorting to derive quantile statistics. I am using the standard APIs (and recommended configuration values) to build the t-digest, however I am observing significantly worse times than raw sorting. Is this to be expected? I would appreciate any help in this regards.
Here is my sample code:
I used a dataset with 1 million latency values (in ms). The max is 100,000. It took about 2.7 secs to build the digest and extract the 50% quantile value. With a dataset with 2 million latency values, the computation time doubled. Raw sorting is under 100ms.
Reducing the compression factor to 5 helps but computation time is still excessively high.
The text was updated successfully, but these errors were encountered: