Skip to content

Large Sort now is faster!

Latest
Compare
Choose a tag to compare
@nelson-perez nelson-perez released this 24 Nov 09:39

Summary

Large sort got several well deserved performance improvements and bug fixes. This time I've updated a bit out of my comfort zone of code readability to make it more performant. Nevertheless now it performs almost as fast as the native sort command that's build in C.

Performance Improvements

Most of the improvements has been done on the Merge operation. The strategy has always been to reduce the number of comparisons and operations. This time I also considered the what is the most likely scenario after merging a record.
There were two major areas of improvements:

  • During the binary search I've reduced to do less comparisons and removed comparison that was not the most common case.
  • While getting merging the stream reader records now it's opportunistic assumes that the next record is within the current stream thus it only re-index the reader only when necessary.
  • Removed array.length - x for comparison using dedicate variables that get updated when it's necessary.
  • The last stream is now processed linearly without any re-index logic.

Bug Fixes

  • There was a bug while deleting the temporary files that did not clean the temporary folder.
  • Thanks to @cupix-eddy-lee who found and fixed a bug that affected the last stream synchronously as it didn't finish writing.
  • Thanks to @ssideleau who found a bug while handling listening before exiting the process listening to SIGKILL instead of SIGINT and SIGTERM