Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update _load_sbert_model Parameters and Fix Tokenize Padding #122

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

pascalhuerten
Copy link

Summary

This pull request addresses two significant updates to the InstructorEmbedding functionality to improve compatibility and performance when using sentence-transformers version 3.0.1.

Changes Introduced:

  1. Refactor: Added Missing Parameters to _load_sbert_model for Enhanced Compatibility

    • Parameters added:
      • local_files_only=False
      • model_kwargs=None
      • tokenizer_kwargs=None
      • config_kwargs=None
    • These parameters were missing and are now included to ensure compatibility with sentence-transformers 3.0.1.
  2. Refactor: Updated tokenize Method's Padding Parameter Back to True

    • Changed the padding parameter from "max_length" back to True.
    • This modification addresses a significant performance bottleneck observed in encoding operations, particularly impacting softmax and linear layers.
    • The performance profile comparison underscores the benefit of this change.

Performance Comparison:

  • With padding=True:

    ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
    145     0.558    0.004    0.558    0.004    {built-in method torch._C._nn.linear}
    24      0.083    0.003    0.440    0.018    .venv\lib\site-packages\transformers\models\t5\modeling_t5.py:279(forward)
    24      0.041    0.002    0.041    0.002    {method 'softmax' of 'torch._C.TensorBase' objects}
    
  • With padding="max_length":

    24      3.274    0.136    3.274    0.136    {method 'softmax' of 'torch._C.TensorBase' objects}
    145     3.253    0.022    3.253    0.022    {built-in method torch._C._nn.linear}
    24      0.625    0.026    5.479    0.228    .venv\lib\site-packages\transformers\models\t5\modeling_t5.py:445(forward)
    

Additional Note

  • There might have been a valid reason to choose padding="max_length" for better performance during parallelization. However, I did not test for this case and therefore cannot speak for it. If so, there should at least be a parameter to choose the padding strategy.

Testing:

  • The updated methods were tested using sentence-transformers 3.0.1 on a CPU with a batch size of 1 on a single sentence to validate the functional and performance improvements.

Please review the changes and let me know if any further adjustments are necessary.

Thank you for considering this request. I look forward to your feedback.

BBC-Esq and others added 7 commits April 13, 2024 16:42
…d dependency to correspond to newer feature, update target name
…to fix error: Instructor._load_sbert_model() got an unexpected keyword argument 'local_files_only'
padding="max_length" lead to 10x slower encoding speed on cpu. Especially linear and softmax function were impacted by this. Dynamic padding solved this performance bottleneck.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants