-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add cross-encoder tracing, config-generating, and uploading #375
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #375 +/- ##
==========================================
- Coverage 91.53% 89.68% -1.86%
==========================================
Files 42 43 +1
Lines 4395 4508 +113
==========================================
+ Hits 4023 4043 +20
- Misses 372 465 +93 ☔ View full report in Codecov by Sentry. |
Thanks for raising this PR. We need to add corresponding tests too. Seems like few functions are mostly common to the |
Customer should have a way to give the name of the generated zip file and the model file as like we have in our SentenceTransformerModel class |
# save tokenizer file | ||
tk_path = Path(f"/tmp/{mname}-tokenizer") | ||
tk.save_pretrained(tk_path) | ||
_fix_tokenizer(tk.model_max_length, tk_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tk.model_max_length
--> not sure if we will get this value all the time. Please take a look at this PR: #219
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k. going with the strategy proposed in huggingface/transformers#14561 - look for max_position_embeddings
or n_positions
in the config object and if I miss in both those places, I set to 32k and hope it's fine. (The fix using sentence_transformers model.get_max_seq_length()
also has the potential to return None)
Tests are failing. |
"model_content_hash_value": hash_value, | ||
"model_config": { | ||
"model_type": model_type, | ||
"embedding_dimension": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at here: https://huggingface.co/BAAI/bge-reranker-base/blob/main/config.json
max_position_embeddings = 514
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is what I did. Artifact of the implementation (depends heavily on the embedding model code). Can probably be cleaned up a bit
elif hasattr(model_config, "n_positions"): | ||
tk.model_max_length = model_config.n_positions | ||
else: | ||
tk.model_max_length = 2**15 # =32768. Set to something big I guess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting an arbitrary value doesn't seem like a good solution.
What do you think about following this: https://github.com/opensearch-project/opensearch-py-ml/blob/main/opensearch_py_ml/ml_models/sentencetransformermodel.py#L936-L942
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would love to. Unfortunately, model.get_max_seq_length()
is not a method of most hf transformer ModelForSequenceClassification - that's a special thing from the sentence-transformers model interface, and I'm not sure it is the guarantee that it claims to be: [implementation]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing two problems here:
-
tk.model_max_length is None: [Bug]
tokenizer.model_max_length
is different when loading model from shortcut or local path huggingface/transformers#14561 ; if you see this,model_max_length
will be a larger value and we are going to set a large value which model can't process -
I'm not a big fan of setting a static big value like
tk.model_max_length = 2**15
. What exactly are we achieving here?
It's important to align model_max_length
with the model's actual capability (max_position_embeddings
). Setting model_max_length
to a higher value than max_position_embeddings
could lead to errors or unexpected behavior, as the model won't be able to handle inputs longer than its architectural limit. Conversely, setting a lower model_max_length
can be useful for optimizing performance or adhering to specific task requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand these issues. tk.model_max_length is None
is exactly the condition that triggers setting it to something reasonable. I agree that it's important to set it to a value that agrees with the model's capability: that's why the first condition here is checking and setting to max_position_embeddings
. We only pick the Very Large Number when we don't have a bound. For instance, suppose someone trained a mamba-based cross encoder with infinite context cuz it's a RNN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion. What I tried to mean here is, tk.model_max_length is None
--> this condition will not even trigger. Without triggering this, model_max_length
could be a large value which model can't process.
tokenizer = GPT2Tokenizer.from_pretrained("path/to/local/gpt2")
print(tokenizer.model_max_length)
# 1000000000000000019884624838656
So here model_max_length
isn't None, right? But still do we want that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Can we just let huggingface/transformers fix this bug? It seems like a them problem, and from what I can tell the only times we're gonna hit it is if someone is trying to use a very old tokenizer file with their thing. At that point I hope we can assume the user is proficient enough with transformers to debug if necessary.
""" | ||
# export to onnx | ||
save_loc = Path(f"/tmp/{mname}.onnx") | ||
torch.onnx.export( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should add onnx
in the requirements.txt.
In addition, can we also add in the requirements-dev.txt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it's there already?
if mname.startswith("bge"): | ||
features["token_type_ids"] = torch.zeros_like(features["input_ids"]) | ||
|
||
if framework == "pt": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should accept "torch_script" as a framework parameter instead of pt
. pt
is an extension type but not a framework. We decide to convert as .pt, somebody else might want to decide as pth
or ptc
. Customer might not know the file format of the torch_script.
So framework choice should be: torch_script
or onnx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
"version": f"1.0.{version_number}", | ||
"description": description, | ||
"model_format": model_format, | ||
"function_name": "TEXT_SIMILARITY", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need model_task_type
. Let's follow sentencetransformer model example method.
) | ||
model_config_content = { | ||
"name": model_name, | ||
"version": f"1.0.{version_number}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to append 1.0
.
Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
fc09ad1
to
588af41
Compare
Signed-off-by: HenryL27 <[email protected]>
if model_name is None: | ||
model_name = Path(self._hf_model_id).name | ||
if description is None: | ||
description = f"Cross Encoder Model {model_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we follow the way we are showing description for embedding models?
Signed-off-by: HenryL27 <[email protected]>
Looks like linting and integration tests are failing could you please take a look? |
Signed-off-by: HenryL27 <[email protected]>
This PR fails with this promising rerank model https://huggingface.co/mixedbread-ai/mxbai-rerank-xsmall-v1 I tried both torch_script and onnx. The torch_script failed to continue saving the trace giving While the onnx saved the onnx and zip files but failed during prediction giving |
looks like maybe huggingface/transformers#20815 is related? Is mxbai-rerank based on Deberta do you know? If so try setting |
Yes it is based on deberta. |
I have set model type to deberta and converted the model into onnx (torch script did not work) and was able to deploy it. However I get the same error above alwhen calling _predict API |
:param folder_path: folder path to save the model | ||
default is /tmp/models/hf_model_id | ||
:type folder_path: str | ||
:param overwrite: whether to overwrite the existing model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whether to overwrite the existing model folder_path ?
tk = AutoTokenizer.from_pretrained(self._hf_model_id) | ||
model = AutoModelForSequenceClassification.from_pretrained(self._hf_model_id) | ||
features = tk([["dummy sentence 1", "dummy sentence 2"]], return_tensors="pt") | ||
mname = Path(self._hf_model_id).name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's follow snake casing? model_name?
features = tk([["dummy sentence 1", "dummy sentence 2"]], return_tensors="pt") | ||
mname = Path(self._hf_model_id).name | ||
|
||
# bge models don't generate token type ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to any issue to reference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I arrived at this conclusion by trying to do the thing and failing, so there might be an issue somewhere out there but it's more of a fundamental architectural feature, not a bug
elif hasattr(model_config, "n_positions"): | ||
tk.model_max_length = model_config.n_positions | ||
else: | ||
tk.model_max_length = 2**15 # =32768. Set to something big I guess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion. What I tried to mean here is, tk.model_max_length is None
--> this condition will not even trigger. Without triggering this, model_max_length
could be a large value which model can't process.
tokenizer = GPT2Tokenizer.from_pretrained("path/to/local/gpt2")
print(tokenizer.model_max_length)
# 1000000000000000019884624838656
So here model_max_length
isn't None, right? But still do we want that?
Signed-off-by: Henry Lindeman <[email protected]>
Signed-off-by: Henry Lindeman <[email protected]>
Description
Adds capabilities for uploading cross encoders. Introduces a 'CrossEncoderModel' class with methods to zip (either torchscript or onnx) and generate a json configuration (as in SentenceTransformerModel) along with a utility
upload
function. ExampleIssues Resolved
[List any issues this PR will resolve]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.