Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (v2): Update so model_max_length updates max_seq_length for Sentence Transformers #8334

Merged
merged 4 commits into from
Sep 6, 2024

Conversation

sjrl
Copy link
Contributor

@sjrl sjrl commented Sep 5, 2024

Related Issues

  • fixes #issue-number

Proposed Changes:

This brings a change we had in v1 that updates the Sentence Transformer specific variable max_seq_length that allows us to change the max sequence length. Previously we though using model_max_length through tokenizer_kwargs would fix this, but a recent inveistgation from @bglearning shows it does not. So we make an update here such that model_max_length updates the value of max_seq_length.

How did you test it?

Notes for the reviewer

Checklist

@sjrl sjrl linked an issue Sep 5, 2024 that may be closed by this pull request
@sjrl sjrl marked this pull request as ready for review September 5, 2024 15:40
@sjrl sjrl requested review from a team as code owners September 5, 2024 15:40
@sjrl sjrl requested review from dfokina and shadeMe and removed request for a team September 5, 2024 15:40
@coveralls
Copy link
Collaborator

coveralls commented Sep 5, 2024

Pull Request Test Coverage Report for Build 10723664490

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.005%) to 90.221%

Files with Coverage Reduction New Missed Lines %
components/embedders/sentence_transformers_document_embedder.py 1 96.67%
components/embedders/sentence_transformers_text_embedder.py 1 96.15%
Totals Coverage Status
Change from base Build 10705409434: 0.005%
Covered Lines: 7021
Relevant Lines: 7782

💛 - Coveralls

@sjrl sjrl merged commit 06dd5c2 into main Sep 6, 2024
33 checks passed
@sjrl sjrl deleted the st-embedders-update branch September 6, 2024 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to set max_seq_len to the SentenceTransformers components
3 participants