Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tokenizer bug #3

Merged
merged 6 commits into from
Mar 22, 2024
Merged

Conversation

nanbo-liu
Copy link
Collaborator

@nanbo-liu nanbo-liu commented Mar 22, 2024

It looks like pipeline is required to have a tokenizer property.
I added a tokenizer property, even though it's not being used for anything.
Notes that sentence-transformers model has their own tokenizer under property model.

in mlserver-huggingface.common.py::load_pipeline_from_setting function

    # If max_batch_size > 1 we need to ensure tokens are padded
    if settings.max_batch_size > 1:
        model = hf_pipeline.model
        if not hf_pipeline.tokenizer.pad_token_id:
            eos_token_id = model.config.eos_token_id  # type: ignore
            if eos_token_id:
                hf_pipeline.tokenizer.pad_token_id = [str(eos_token_id)]  # type: ignore
            else:
                logger.warning(
                    "Model has neither pad_token or eos_token, setting batch size to 1"
                )
                hf_pipeline._batch_size = 1

We are doing some modification to tokenizer when max_batch_size is greater than 1.
It also seems that inference server would deliberately set max_batch_size to be greater than 1
Sentence-transformer model has their own way to handle max_batch_size.
To avoid bug, we keep a tokenizer property in StEmbeddingPipeline

@nanbo-liu nanbo-liu merged commit 8aee783 into Striveworks:master Mar 22, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants