Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(deps): bump the minor group across 1 directory with 3 updates #927

Closed
wants to merge 1 commit into from

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Jun 25, 2024

Bumps the minor group with 3 updates in the / directory: semantic-text-splitter, mypy and types-setuptools.

Updates semantic-text-splitter from 0.13.3 to 0.14.0

Release notes

Sourced from semantic-text-splitter's releases.

v0.14.0

What's New

Performance fixes for large documents. The worst-case performance for certain documents was abysmal, leading to documents that ran forever. This release makes sure that in the worst case, the splitter won't be binary searching over the entire document, which it was before. This is prohibitively expensive especially for the tokenizer implementations, and now this should always have a safe upper bound to the search space.

For the "happy path", this new approach also led to big speed gains in the CodeSplitter (50%+ speed increase in some cases), marginal regressions in the MarkdownSplitter, and not much difference in the TextSplitter. But overall, the performance should be more consistent across documents, since it wasn't uncommon for a document with certain formatting to hit the worst-case scenario previously.

Breaking Changes

  • Chunk output may be slightly different because of the changes to the search optimizations. The previous optimization occasionally caused the splitter to stop too soon. For most cases, you may see no difference. It was most pronounced in the MarkdownSplitter at very small sizes, and any splitter using RustTokenizers because of its offset behavior.

Rust

  • ChunkSize has been removed. This was a holdover from a previous internal optimization, which turned out to not be very accurate anyway.
  • This makes implementing a custom ChunkSizer much easier, as you now only need to generate the size of the chunk as a usize. It often required in tokenization implementations to do more work to calculate the size as well, which is no longer necessary.

Before

pub trait ChunkSizer {
    // Required method
    fn chunk_size(&self, chunk: &str, capacity: &ChunkCapacity) -> ChunkSize;
}

After

pub trait ChunkSizer {
    // Required method
    fn size(&self, chunk: &str) -> usize;
}

Full Changelog: benbrandt/text-splitter@v0.13.3...v0.14.0

Changelog

Sourced from semantic-text-splitter's changelog.

v0.14.0

What's New

Performance fixes for large documents. The worst-case performance for certain documents was abysmal, leading to documents that ran forever. This release makes sure that in the worst case, the splitter won't be binary searching over the entire document, which it was before. This is prohibitively expensive especially for the tokenizer implementations, and now this should always have a safe upper bound to the search space.

For the "happy path", this new approach also led to big speed gains in the CodeSplitter (50%+ speed increase in some cases), marginal regressions in the MarkdownSplitter, and not much difference in the TextSplitter. But overall, the performance should be more consistent across documents, since it wasn't uncommon for a document with certain formatting to hit the worst-case scenario previously.

Breaking Changes

  • Chunk output may be slightly different because of the changes to the search optimizations. The previous optimization occasionally caused the splitter to stop too soon. For most cases, you may see no difference. It was most pronounced in the MarkdownSplitter at very small sizes, and any splitter using RustTokenizers because of its offset behavior.

Rust

  • ChunkSize has been removed. This was a holdover from a previous internal optimization, which turned out to not be very accurate anyway.
  • This makes implementing a custom ChunkSizer much easier, as you now only need to generate the size of the chunk as a usize. It often required in tokenization implementations to do more work to calculate the size as well, which is no longer necessary.
Before
pub trait ChunkSizer {
    // Required method
    fn chunk_size(&self, chunk: &str, capacity: &ChunkCapacity) -> ChunkSize;
}
After
pub trait ChunkSizer {
    // Required method
    fn size(&self, chunk: &str) -> usize;
}
Commits
  • 7c3cbbd Update changelog with details about the fix
  • b8b2184 New attempt at finding best effort binary search window
  • 53a31b5 Remove need for ChunkSize in public interface
  • 14e0699 Use current stats to make a more accurate guess
  • ef3c61b Start to update the changelog
  • c003481 Remove incorrect max_encoded_offset optimization
  • 8d57618 Expanding binary search window
  • b1b39d1 Bump the minor group with 2 updates
  • f31f1e5 Bump the minor group in /docs with 5 updates
  • 52e8f8f Bump the minor group in /docs with 2 updates
  • Additional commits viewable in compare view

Updates mypy from 1.10.0 to 1.10.1

Changelog

Sourced from mypy's changelog.

Mypy 1.10.1

  • Fix error reporting on cached run after uninstallation of third party library (Shantanu, PR 17420)

Acknowledgements

Thanks to all mypy contributors who contributed to this release:

  • Alex Waygood
  • Ali Hamdan
  • Edward Paget
  • Evgeniy Slobodkin
  • Hashem
  • hesam
  • Hugo van Kemenade
  • Ihor
  • James Braza
  • Jelle Zijlstra
  • jhance
  • Jukka Lehtosalo
  • Loïc Simon
  • Marc Mueller
  • Matthieu Devlin
  • Michael R. Crusoe
  • Nikita Sobolev
  • Oskari Lehto
  • Riccardo Di Maio
  • Richard Si
  • roberfi
  • Roman Solomatin
  • Sam Xifaras
  • Shantanu
  • Spencer Brown
  • Srinivas Lade
  • Tamir Duberstein
  • youkaichao

I’d also like to thank my employer, Dropbox, for supporting mypy development.

Mypy 1.9

We’ve just uploaded mypy 1.9 to the Python Package Index (PyPI). Mypy is a static type checker for Python. This release includes new features, performance improvements and bug fixes. You can install it as follows:

python3 -m pip install -U mypy

You can read the full documentation for this release on Read the Docs.

Breaking Changes

Because the version of typeshed we use in mypy 1.9 doesn't support 3.7, neither does mypy 1.9. (Jared Hance, PR 16883)

... (truncated)

Commits
  • c28b525 [1.10 backport] Fix error reporting on cached run after uninstallation of thi...
  • See full diff in compare view

Updates types-setuptools from 70.0.0.20240524 to 70.1.0.20240625

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore <dependency name> major version will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself)
  • @dependabot ignore <dependency name> minor version will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself)
  • @dependabot ignore <dependency name> will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself)
  • @dependabot unignore <dependency name> will remove all of the ignore conditions of the specified dependency
  • @dependabot unignore <dependency name> <ignore condition> will remove the ignore condition of the specified dependency and ignore conditions

Bumps the minor group with 3 updates in the / directory: [semantic-text-splitter](https://github.com/benbrandt/text-splitter), [mypy](https://github.com/python/mypy) and [types-setuptools](https://github.com/python/typeshed).


Updates `semantic-text-splitter` from 0.13.3 to 0.14.0
- [Release notes](https://github.com/benbrandt/text-splitter/releases)
- [Changelog](https://github.com/benbrandt/text-splitter/blob/main/CHANGELOG.md)
- [Commits](benbrandt/text-splitter@v0.13.3...v0.14.0)

Updates `mypy` from 1.10.0 to 1.10.1
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](python/mypy@v1.10.0...v1.10.1)

Updates `types-setuptools` from 70.0.0.20240524 to 70.1.0.20240625
- [Commits](https://github.com/python/typeshed/commits)

---
updated-dependencies:
- dependency-name: semantic-text-splitter
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: minor
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor
- dependency-name: types-setuptools
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Jun 25, 2024
Copy link
Contributor Author

dependabot bot commented on behalf of github Jun 27, 2024

Superseded by #934.

@dependabot dependabot bot closed this Jun 27, 2024
@dependabot dependabot bot deleted the dependabot/pip/minor-d9242e81fe branch June 27, 2024 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants