-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sorts s_center to improve search #15
base: master
Are you sure you want to change the base?
Conversation
This change implements point_search iteratively to ensure a stack overflow can not occur. It also makes the method use the passed unique parameter, as the recursive call was previously defaulting unique to true: itv = [(5...20), (15.6...20), (15.7...20), (15.7...20)] t = IntervalTree::Tree.new(itv) p t.search(15.7) #=> [5...20, 15.6...20, 15.7...20] p t.search(15.7, unique: false) #=> Should be [5...20, 15.6...20, 15.7...20, 15.7...20], not [5...20, 15.6...20, 15.7...20] again Additionally, result.uniq is invoked at the end of the method for faster query speeds.
Fixed point_search Indentation
Co-authored-by: amarzot-yesware <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one.
Again, could you add a spec please?
And let's rebase this too when ready.
btw, I'd separate linting changes from functionality changes in future to make reviewing easier =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I should have selected this, given the missing spec)
I decided to benchmark this PR's approach for searching the node. I did this by constructing a node with some number of overlapping intervals (as they would be constructed in the tree) and trying different searching algorithms. ContextAlgorithms Benchmarked
TimingHere is an explanation of the types of timings in the output ( Types of Queries
I ran the benchmarks with different numbers of intervals in ResultsThe results are as follows (with analysis and code afterwards): Benchmark Query Types
Benchmark IPS
AnalysisThere are a few main points I'd like to highlight.
Benchmark Script & Code: https://gist.github.com/amarzot-yesware/dc8aea0e9970e92c86cb35389254c8a9 |
I was just re-reading the interval tree wikipedia page and it describes exactly the approach I implemented in my branch. |
Hi @amarzot-yesware, thanks for your in-depth benchmarking and explanation. The approach from this PR should be faster for point searches. |
The trick is having 4 or more intervals per node, but maybe that it what you meant, and maybe that too is the norm (i'd imagine it is).
As you recommended on my gist, I'll do some benchmarking to confirm and post results. |
Thank you, much appreciated. |
As requested, linting changes are now separate from functionality changes to make reviewing easier. The sorted parameter was removed since I believe it was redundantly sorting the output array, which is probably why it was slower than the current implementation.
f57acc8
to
8e2927b
Compare
The requested changes and feedback were taken into account in the last push. The Linting changes with Rubocop were kept separate in another commit, as well as the removal of duplicate method calls that Reek was complaining about. Personally I think it affects readability negatively, but it may be worth it in case it makes the code more efficient. I also checked the code with Fasterer and it did not report the use of The previous benchmarks did not take point searches in account. It would be interesting to see a comparison between the approaches in this regard, the impact of uniq/disabing sorting, and the last few changes. (I tried running the benchmarking code but haven't gotten it to work yet) |
Hi, BenchmarksGenerating 10000 intervals starting in range [0...100000] Generating 100000 queries Recursive implementation with one list per node, using average (s_center not sorted) Recursive implementation with one list per node, using median (s_center not sorted) Iterative implementation with one list per node using average (s_center sorted), not taking maximum interval end into account [Previous version] Iterative implementation with one list per node using average (s_center sorted), taking maximum interval end into account [Pushed version] Iterative implementation with one list per node using median (s_center sorted), taking maximum interval end into account Iterative implementation with two lists per node, using average (s_center sorted) Iterative implementation with two lists per node, using median (s_center sorted) Iterative implementation with two lists per node, using average (s_center sorted), taking maximum interval end into account Using average (s_center sorted), taking maximum interval end into account, no rationals Using median (s_center sorted), taking maximum interval end into account, no rationals Code is available here. I've played around with centering the intervals using the median of the values instead of the mean but this wasn't faster, even though it should result in more balanced trees. Storing two lists per node was also not quicker. By not casting to rationals the performance improves a bit, but then the time ranges stop working. Any updates on #13 being merged? |
@danielpetri1 wrote:
Unfortunately we (the GreenSync team) are finding that we do not have the resources to actively maintain this gem. We are also no longer using it ourselves which makes it difficult to make decisions about what direction to take development in. |
I'd be interested, yes, although @amarzot-yesware seems to have quite a few good ideas here besides being quite experienced. If they're not interested I could give it a shot. |
Good call. @amarzot-yesware what do you think? Are you interested in becoming a maintainer (or co-maintainer?) |
@maddymarkovitz @danielpetri1 I appreciate the offer! I would certainly be interested, but I'm not sure I have the bandwidth to be a full time maintainer. If @danielpetri1 would be up for it, I would join them as a co-maintainer. Another thing to note is that this is my work account. If I were to be a maintainer, I'd prefer to use my personal account @amarzot |
@danielpetri1 @amarzot-yesware we're still looking for maintainers if you guys are interested :) |
@jeremiahrose Sorry, I don't think it's a good idea for me, as I no longer work with this tech (or at the company that used it). Best of luck! |
Note: this PR builds upon the work by @amarzot-yesware to allow the use of custom objects, as well as my other PR in which point_search was fixed.
s_center
is sorted in divide_intervals as to increase the speed of thepoint_search
method. Specs and search methods modified accordingly.