[Feature] Add New Question Answering Model #349

faradawn · 2023-11-30T16:49:44Z

Description

Added a question_answering_model.py to trace the model into TorchScript or Onnx format.
Added a test file to compare the traced model with the original model's output.

Issues Resolved

Issue: [FEATURE] Trace Question Answering models to TorchScript and Onnx format #304
Old pull request: [Feature] Add Question Answering Model (old) #329

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

Test cases

All tests passed.

test_cases = [
    {
        "question": "Who was Jim Henson?",
        "context": "Jim Henson was a nice puppet"
    },
    {
        "question": "Where do I live?",
        "context": "My name is Sarah and I live in London"
    },
    {
        "question": "What's my name?",
        "context": "My name is Clara and I live in Berkeley."
    },
    {
        "question": "Which name is also used to describe the Amazon rainforest in English?",
        "context": "The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain 'Amazonas' in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."
    }
]

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: faradawn <[email protected]>

rawwar · 2023-11-30T17:30:12Z

opensearch_py_ml/ml_models/__init__.py


-__all__ = ["SentenceTransformerModel", "MCorr"]
+__all__ = ["SentenceTransformerModel", "SentenceTransformerModel", "MCorr"]


Is this supposed to be adding QuestionAnsweringModel?

rawwar · 2023-11-30T17:30:50Z

opensearch_py_ml/ml_models/question_answering_model.py

+import yaml
+from accelerate import Accelerator, notebook_launcher
+from mdutils.fileutils import MarkDownFile
+# from sentence_transformers import SentenceTransformer


Can we remove commented imports?

rawwar · 2023-11-30T17:34:53Z

tests/ml_models/test_question_answering_pytest.py

+
+default_model_id = "distilbert-base-cased-distilled-squad"
+
+def clean_test_folder(TEST_FOLDER):


Can you please take a look at this link - It talks about how to use temporary files, directories with pytest. Do you think, using these fixtures help us?

Also, a helpful stackoverflow post - https://stackoverflow.com/questions/51593595/pytest-auto-delete-temporary-directory-created-with-tmpdir-factory

rawwar · 2023-11-30T17:37:31Z

opensearch_py_ml/ml_models/question_answering_model.py

+        # max_position_embeddings
+
+        # AutoTokenizer will save tokenizer.json in save_json_folder_name
+        # DistilBertTokenizer will save it in cache: /Users/faradawn/.cache/huggingface/hub/models/...


These seem to be your notes?

rawwar · 2023-11-30T17:38:18Z

opensearch_py_ml/ml_models/question_answering_model.py

+                else self.onnx_zip_file_path
+            )
+
+            # model_zip_file_path = '/Users/faradawn/CS/opensearch-py-ml/opensearch_py_ml/ml_models/question-model-folder/distilbert-base-cased-distilled-squad.zip'


Removable comment?

rawwar

I just reviewed code and hasn't actually tested its workings. Will provide another detailed feedback once i run these locally. Thanks!

Signed-off-by: faradawn <[email protected]>

faradawn · 2023-11-30T19:13:15Z

Hi Kalyan,

Thank you for the detailed feedback! I have removed the uncessary comments and fixed init.py.

Regarding the use of fixture in pytest, I was following sentence_transformer's pytest structure, which used the raw "clean folder" function. I hoped to keep the test files similiar. But would love to learn about using fixture, if it can make a bigger improvement.

Thanks,
Faradawn

rawwar · 2023-11-30T20:07:56Z

@dhrubo-os , can you please approve tests to run

Signed-off-by: faradawn <[email protected]>

codecov · 2023-12-06T01:31:58Z

Codecov Report

Attention: Patch coverage is 94.77124% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 91.64%. Comparing base (529ee34) to head (2e0b7c5).

Files	Patch %	Lines
...search_py_ml/ml_models/question_answering_model.py	94.70%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #349      +/-   ##
==========================================
+ Coverage   91.53%   91.64%   +0.10%     
==========================================
  Files          42       43       +1     
  Lines        4395     4547     +152     
==========================================
+ Hits         4023     4167     +144     
- Misses        372      380       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dhrubo-os · 2023-12-06T01:42:15Z

@faradawn I think overall this is a great start. Thanks for raising this PR

lint is failing.
codecode is already showing some missing coverage tests, please add more unit tests
Address PR comments.

We are extending the program few more weeks. So please continue on this PR. Thanks, happy coding.

faradawn · 2023-12-06T01:54:35Z

Hi Dhrubo, Thanks for checking the PR! Will work on it more by fixing the lint and unit tests! Thanks, Faradawn

…

________________________________ From: Dhrubo Saha ***@***.***> Sent: Tuesday, December 5, 2023 7:42:26 PM To: opensearch-project/opensearch-py-ml ***@***.***> Cc: Faradawn Yang ***@***.***>; Mention ***@***.***> Subject: Re: [opensearch-project/opensearch-py-ml] [Feature] Add New Question Answering Model (PR #349) @faradawn<https://urldefense.com/v3/__https://github.com/faradawn__;!!BpyFHLRN4TMTrA!4jzRdXfMXQCAEir1NmEQ_Im18dmrC9o1jFWQtr2hc8BFIG_s59PkaiTbo4Df31yauZY_ghI8VX_HupX1S67BTLRCdiPh$> I think overall this is a great start. Thanks for raising this PR 1. lint is failing. 2. codecode is already showing some missing coverage tests, please add more unit tests 3. Address PR comments. We are extending the program few more weeks. So please continue on this PR. Thanks, happy coding. — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/opensearch-project/opensearch-py-ml/pull/349*issuecomment-1841938846__;Iw!!BpyFHLRN4TMTrA!4jzRdXfMXQCAEir1NmEQ_Im18dmrC9o1jFWQtr2hc8BFIG_s59PkaiTbo4Df31yauZY_ghI8VX_HupX1S67BTDe928QZ$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ARNNCKAGDOSVGEK3QSHA5KDYH7EQFAVCNFSM6AAAAABABQCRZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRHEZTQOBUGY__;!!BpyFHLRN4TMTrA!4jzRdXfMXQCAEir1NmEQ_Im18dmrC9o1jFWQtr2hc8BFIG_s59PkaiTbo4Df31yauZY_ghI8VX_HupX1S67BTLDKbXYv$>. You are receiving this because you were mentioned.Message ID: ***@***.***>

opensearch_py_ml/ml_models/question_answering_model.py

mingshl · 2023-12-20T21:53:32Z

opensearch_py_ml/ml_models/question_answering_model.py

+        Download the model directly from huggingface, convert model to torch script format,
+        zip the model file and its tokenizer.json file to prepare to upload to the Open Search cluster
+
+        :param sentences:


It looks like it's assigned default value ["today is sunny"] to sentences, so is it still required?

Changed to optional.

mingshl · 2023-12-20T21:54:35Z

opensearch_py_ml/ml_models/question_answering_model.py

+            Required, for example  sentences = ['today is sunny']
+        :type sentences: List of string [str]
+        :param model_id:
+            question answering model id to download model from question answerings.


is the model_id also optional? it seems to have a default model id assigned

Changed to optional.

mingshl · 2023-12-20T21:58:46Z

opensearch_py_ml/ml_models/question_answering_model.py

+            Required, for example  sentences = ['today is sunny']
+        :type sentences: List of string [str]
+        :param model_id:
+            question answering model id to download model from question answerings.


is this going to download from question answerings or download from huggingface?

Yes, will download the model with 'model_id' from huggingface.

mingshl · 2023-12-20T21:59:16Z

opensearch_py_ml/ml_models/question_answering_model.py

+        zip the model file and its tokenizer.json file to prepare to upload to the Open Search cluster
+
+        :param model_id:
+            question answering model id to download model from question answerings.


is this going to download from question answerings or download from huggingface?

Yes, will download from huggingface. Thanks!

mingshl · 2023-12-20T21:59:48Z

opensearch_py_ml/ml_models/question_answering_model.py

+        Download question answering model directly from huggingface, convert model to onnx format,
+        zip the model file and its tokenizer.json file to prepare to upload to the Open Search cluster
+
+        :param model_id:


is the model_id also optional? it seems to have a default model id assigned

Changed to optional.

…ch-py-ml into feature/question_answering_model

Signed-off-by: faradawn <[email protected]>

faradawn · 2023-12-22T00:24:17Z

Hi @mingshl,

Thank you for the careful review. I have made changes to the function descriptions, e.g. optional parameters, accordingly.

Hi @dhrubo-os,

I have checked CodeCov's result and added unit tests accordingly. The code is ready for a CodeCov test again.

Thanks,
Faradawn

dhrubo-os · 2024-03-20T17:00:48Z

@faradawn let's wrap up the PR? Can you please fix the conflicts?

Signed-off-by: Faradawn Yang <[email protected]>

faradawn · 2024-03-21T05:37:29Z

Got it, Dhrubo. I will fix the following lint issue.

nox > black --check --target-version=py38 setup.py noxfile.py opensearch_py_ml/ utils/ tests/
would reformat /home/runner/work/opensearch-py-ml/opensearch-py-ml/opensearch_py_ml/ml_models/question_answering_model.py
would reformat /home/runner/work/opensearch-py-ml/opensearch-py-ml/tests/ml_models/test_question_answering_pytest.py

dhrubo-os · 2024-03-21T06:33:47Z

let's add the changelog file.

Signed-off-by: faradawn <[email protected]>

faradawn · 2024-03-21T15:34:06Z

Hi @dhrubo-os,

Thanks for checking! I have added a missing package in requirement-dev.txt, added to CHANGELOG file, and fixed formating issues.

On my Mac, I only know pytest and nox -rs test. If there is a more comprehensive test I can run, please let me know!

Thanks.

Signed-off-by: faradawn <[email protected]>

faradawn · 2024-03-22T03:30:48Z

Hi @dhrubo-os, the failing integration test is fixed. There are 4 worklows awaiting approval. If there is anything I can do, please let me know.

add model and pytest file, passed all tests

076221c

Signed-off-by: faradawn <[email protected]>

faradawn requested review from dhrubo-os, greaa-aws, ylwu-amzn, b4sjoo, jngz-es and rbhavna as code owners November 30, 2023 16:49

faradawn mentioned this pull request Nov 30, 2023

[Feature] Add Question Answering Model (old) #329

Closed

5 tasks

rawwar reviewed Nov 30, 2023

View reviewed changes

rawwar suggested changes Nov 30, 2023

View reviewed changes

faradawn added 2 commits November 30, 2023 13:08

remove comments and fix init.py

485e361

Signed-off-by: faradawn <[email protected]>

remove model zip path comment

fda3f00

Signed-off-by: faradawn <[email protected]>

add onnxruntime in requirements-dev.txt

907fcdb

Signed-off-by: faradawn <[email protected]>

mingshl reviewed Dec 20, 2023

View reviewed changes

faradawn added 2 commits December 21, 2023 11:23

Merge branch 'main' of https://github.com/opensearch-project/opensear…

5021136

…ch-py-ml into feature/question_answering_model

add unit tests and fixed optional parameters

eef1ac4

Signed-off-by: faradawn <[email protected]>

Merge branch 'main' into feature/question_answering_model

f2ae84f

Signed-off-by: Faradawn Yang <[email protected]>

add to requirements-dev, add to CHANGELOG, fix format with nox

fc625bc

Signed-off-by: faradawn <[email protected]>

fix sentencetransfoer test of long description

2e0b7c5

Signed-off-by: faradawn <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add New Question Answering Model #349

[Feature] Add New Question Answering Model #349

faradawn commented Nov 30, 2023 •

edited

Loading

rawwar Nov 30, 2023 •

edited

Loading

rawwar Nov 30, 2023

rawwar Nov 30, 2023 •

edited

Loading

rawwar Nov 30, 2023

rawwar Nov 30, 2023

rawwar left a comment

faradawn commented Nov 30, 2023

rawwar commented Nov 30, 2023

codecov bot commented Dec 6, 2023 •

edited

Loading

dhrubo-os commented Dec 6, 2023

faradawn commented Dec 6, 2023 via email

mingshl Dec 20, 2023

faradawn Dec 22, 2023

mingshl Dec 20, 2023

faradawn Dec 22, 2023

mingshl Dec 20, 2023

faradawn Dec 22, 2023

mingshl Dec 20, 2023

faradawn Dec 22, 2023

mingshl Dec 20, 2023

faradawn Dec 22, 2023

faradawn commented Dec 22, 2023

dhrubo-os commented Mar 20, 2024

faradawn commented Mar 21, 2024

dhrubo-os commented Mar 21, 2024

faradawn commented Mar 21, 2024

faradawn commented Mar 22, 2024


		__all__ = ["SentenceTransformerModel", "MCorr"]
		__all__ = ["SentenceTransformerModel", "SentenceTransformerModel", "MCorr"]


		default_model_id = "distilbert-base-cased-distilled-squad"

		def clean_test_folder(TEST_FOLDER):

[Feature] Add New Question Answering Model #349

Are you sure you want to change the base?

[Feature] Add New Question Answering Model #349

Conversation

faradawn commented Nov 30, 2023 • edited Loading

Description

Issues Resolved

Check List

Test cases

rawwar Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rawwar Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rawwar left a comment

Choose a reason for hiding this comment

faradawn commented Nov 30, 2023

rawwar commented Nov 30, 2023

codecov bot commented Dec 6, 2023 • edited Loading

Codecov Report

dhrubo-os commented Dec 6, 2023

faradawn commented Dec 6, 2023 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

faradawn commented Dec 22, 2023

dhrubo-os commented Mar 20, 2024

faradawn commented Mar 21, 2024

dhrubo-os commented Mar 21, 2024

faradawn commented Mar 21, 2024

faradawn commented Mar 22, 2024

faradawn commented Nov 30, 2023 •

edited

Loading

rawwar Nov 30, 2023 •

edited

Loading

rawwar Nov 30, 2023 •

edited

Loading

codecov bot commented Dec 6, 2023 •

edited

Loading