-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add New Question Answering Model #349
base: main
Are you sure you want to change the base?
[Feature] Add New Question Answering Model #349
Conversation
Signed-off-by: faradawn <[email protected]>
|
||
__all__ = ["SentenceTransformerModel", "MCorr"] | ||
__all__ = ["SentenceTransformerModel", "SentenceTransformerModel", "MCorr"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be adding QuestionAnsweringModel
?
import yaml | ||
from accelerate import Accelerator, notebook_launcher | ||
from mdutils.fileutils import MarkDownFile | ||
# from sentence_transformers import SentenceTransformer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove commented imports?
|
||
default_model_id = "distilbert-base-cased-distilled-squad" | ||
|
||
def clean_test_folder(TEST_FOLDER): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please take a look at this link - It talks about how to use temporary files, directories with pytest. Do you think, using these fixtures help us?
Also, a helpful stackoverflow post - https://stackoverflow.com/questions/51593595/pytest-auto-delete-temporary-directory-created-with-tmpdir-factory
# max_position_embeddings | ||
|
||
# AutoTokenizer will save tokenizer.json in save_json_folder_name | ||
# DistilBertTokenizer will save it in cache: /Users/faradawn/.cache/huggingface/hub/models/... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seem to be your notes?
else self.onnx_zip_file_path | ||
) | ||
|
||
# model_zip_file_path = '/Users/faradawn/CS/opensearch-py-ml/opensearch_py_ml/ml_models/question-model-folder/distilbert-base-cased-distilled-squad.zip' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removable comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reviewed code and hasn't actually tested its workings. Will provide another detailed feedback once i run these locally. Thanks!
Signed-off-by: faradawn <[email protected]>
Signed-off-by: faradawn <[email protected]>
Hi Kalyan, Thank you for the detailed feedback! I have removed the uncessary comments and fixed init.py. Regarding the use of fixture in pytest, I was following sentence_transformer's pytest structure, which used the raw "clean folder" function. I hoped to keep the test files similiar. But would love to learn about using fixture, if it can make a bigger improvement. Thanks, |
@dhrubo-os , can you please approve tests to run |
Signed-off-by: faradawn <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #349 +/- ##
==========================================
+ Coverage 91.53% 91.64% +0.10%
==========================================
Files 42 43 +1
Lines 4395 4547 +152
==========================================
+ Hits 4023 4167 +144
- Misses 372 380 +8 ☔ View full report in Codecov by Sentry. |
@faradawn I think overall this is a great start. Thanks for raising this PR
We are extending the program few more weeks. So please continue on this PR. Thanks, happy coding. |
Hi Dhrubo,
Thanks for checking the PR! Will work on it more by fixing the lint and unit tests!
Thanks,
Faradawn
…________________________________
From: Dhrubo Saha ***@***.***>
Sent: Tuesday, December 5, 2023 7:42:26 PM
To: opensearch-project/opensearch-py-ml ***@***.***>
Cc: Faradawn Yang ***@***.***>; Mention ***@***.***>
Subject: Re: [opensearch-project/opensearch-py-ml] [Feature] Add New Question Answering Model (PR #349)
@faradawn<https://urldefense.com/v3/__https://github.com/faradawn__;!!BpyFHLRN4TMTrA!4jzRdXfMXQCAEir1NmEQ_Im18dmrC9o1jFWQtr2hc8BFIG_s59PkaiTbo4Df31yauZY_ghI8VX_HupX1S67BTLRCdiPh$> I think overall this is a great start. Thanks for raising this PR
1. lint is failing.
2. codecode is already showing some missing coverage tests, please add more unit tests
3. Address PR comments.
We are extending the program few more weeks. So please continue on this PR. Thanks, happy coding.
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/opensearch-project/opensearch-py-ml/pull/349*issuecomment-1841938846__;Iw!!BpyFHLRN4TMTrA!4jzRdXfMXQCAEir1NmEQ_Im18dmrC9o1jFWQtr2hc8BFIG_s59PkaiTbo4Df31yauZY_ghI8VX_HupX1S67BTDe928QZ$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ARNNCKAGDOSVGEK3QSHA5KDYH7EQFAVCNFSM6AAAAABABQCRZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRHEZTQOBUGY__;!!BpyFHLRN4TMTrA!4jzRdXfMXQCAEir1NmEQ_Im18dmrC9o1jFWQtr2hc8BFIG_s59PkaiTbo4Df31yauZY_ghI8VX_HupX1S67BTLDKbXYv$>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Download the model directly from huggingface, convert model to torch script format, | ||
zip the model file and its tokenizer.json file to prepare to upload to the Open Search cluster | ||
|
||
:param sentences: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's assigned default value ["today is sunny"] to sentences, so is it still required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to optional.
Required, for example sentences = ['today is sunny'] | ||
:type sentences: List of string [str] | ||
:param model_id: | ||
question answering model id to download model from question answerings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the model_id also optional? it seems to have a default model id assigned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to optional.
Required, for example sentences = ['today is sunny'] | ||
:type sentences: List of string [str] | ||
:param model_id: | ||
question answering model id to download model from question answerings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this going to download from question answerings
or download from huggingface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will download the model with 'model_id' from huggingface.
zip the model file and its tokenizer.json file to prepare to upload to the Open Search cluster | ||
|
||
:param model_id: | ||
question answering model id to download model from question answerings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this going to download from question answerings or download from huggingface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will download from huggingface. Thanks!
Download question answering model directly from huggingface, convert model to onnx format, | ||
zip the model file and its tokenizer.json file to prepare to upload to the Open Search cluster | ||
|
||
:param model_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the model_id also optional? it seems to have a default model id assigned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to optional.
…ch-py-ml into feature/question_answering_model
Signed-off-by: faradawn <[email protected]>
Hi @mingshl, Thank you for the careful review. I have made changes to the function descriptions, e.g. optional parameters, accordingly. Hi @dhrubo-os, I have checked CodeCov's result and added unit tests accordingly. The code is ready for a CodeCov test again. Thanks, |
@faradawn let's wrap up the PR? Can you please fix the conflicts? |
Signed-off-by: Faradawn Yang <[email protected]>
Got it, Dhrubo. I will fix the following lint issue.
|
let's add the changelog file. |
Signed-off-by: faradawn <[email protected]>
Hi @dhrubo-os, Thanks for checking! I have added a missing package in requirement-dev.txt, added to CHANGELOG file, and fixed formating issues. On my Mac, I only know Thanks. |
Signed-off-by: faradawn <[email protected]>
Hi @dhrubo-os, the failing integration test is fixed. There are 4 worklows awaiting approval. If there is anything I can do, please let me know. |
Description
Issues Resolved
Check List
Test cases
All tests passed.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.