Skip to content

Commit

Permalink
Update changelog with details about the fix
Browse files Browse the repository at this point in the history
  • Loading branch information
benbrandt committed Jun 21, 2024
1 parent b8b2184 commit 7c3cbbd
Show file tree
Hide file tree
Showing 2 changed files with 351 additions and 349 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@

### What's New

- Performance optimizations.
**Performance fixes for large documents.** The worst-case performance for certain documents was abysmal, leading to documents [that ran forever](https://github.com/benbrandt/text-splitter/issues/184). This release makes sure that in the worst case, the splitter won't be binary searching over the entire document, which it was before. This is prohibitively expensive especially for the tokenizer implementations, and now this should always have a safe upper bound to the search space.

For the "happy path", this new approach also led to big speed gains in the `CodeSplitter` (50%+ speed increase in some cases), marginal regressions in the `MarkdownSplitter`, and not much difference in the `TextSplitter`. But overall, the performance should be more consistent across documents, since it wasn't uncommon for a document with certain formatting to hit the worst-case scenario previously.

### Breaking Changes

Expand Down
Loading

0 comments on commit 7c3cbbd

Please sign in to comment.