v0.8.0 - Performance Improvements #124
benbrandt
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What's New
Significantly fewer allocations necessary when generating chunks. This should result in a performance improvement for most use cases. This was achieved by both reusing pre-allocated collections, as well as memoizing chunk size calculations since that is often the bottleneck, and tokenizer libraries tend to be very allocation heavy!
Benchmarks show:
Breaking Changes
MarkdownSplitter
logic that caused some strange split points.Text
semantic level inMarkdownSplitter
has been merged with inline elements to also find better split points inside content.MarkdownSplitter
, but there were same cases of different behavior in theTextSplitter
as well if chunks are not trimmed.All of the above can cause different chunks to be output than before, depending on the text. So, even though these are bug fixes to bring intended behavior, they are being treated as a major version bump.
Full Changelog: v0.7.0...v0.8.0
This discussion was created from the release v0.8.0 - Performance Improvements.
Beta Was this translation helpful? Give feedback.
All reactions