Feature method run multi prompts #33

anujsinha3 · 2024-01-30T11:20:16Z

Change Description

[ x] My PR includes a link to the issue that I am addressing
exp: create a method [similar to eval()] to run and compare multiple prompts on single data file input. #31

Solution Description

created a command-line typer method named eval_on_prompts_file() that:

take a list of prompts (as a dictionary )as input
for each prompt - runs eval_prompt() for data_file
for each prompt - log the results and parameters.
logs the performance (evaluation metrics) for each run

command line syntax example:

autodoc eval-on-prompts-file data/autora/data.jsonl data/autora/prompts/all_prompt.json

TODO: add tests

Code Quality

I have read the Contribution Guide
My code follows the code style of this project
My code builds (or compiles) cleanly without any errors or warnings
My code contains relevant comments and necessary documentation

Project-Specific Pull Request Checklists

Bug Fix Checklist

My fix includes a new test that breaks as a result of the bug (if possible)
My change includes a breaking change
- My change includes backwards compatibility and deprecation warnings (if possible)

New Feature Checklist

I have added or updated the docstrings associated with my feature using the NumPy docstring format
I have updated the tutorial to highlight my new feature (if appropriate)
I have added unit/End-to-End (E2E) test cases to cover my new feature
My change includes a breaking change
- My change includes backwards compatibility and deprecation warnings (if possible)

Documentation Change Checklist

Any updated docstrings use the NumPy docstring format

Build/CI Change Checklist

If required or optional dependencies have changed (including version numbers), I have updated the README to reflect this
If this is a new CI setup, I have added the associated badge to the README

Other Change Checklist

Any new or updated docstrings use the NumPy docstring format.
I have updated the tutorial to highlight my new feature (if appropriate)
I have added unit/End-to-End (E2E) test cases to cover any changes
My change includes a breaking change
- My change includes backwards compatibility and deprecation warnings (if possible)

…pts on single data file input

codecov-commenter · 2024-01-30T11:24:11Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (3c7e0a0) 97.17% compared to head (667e77e) 97.32%.
Report is 1 commits behind head on main.

Files	Patch %	Lines
src/autora/doc/pipelines/main.py	96.42%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #33      +/-   ##
==========================================
+ Coverage   97.17%   97.32%   +0.14%     
==========================================
  Files           3        5       +2     
  Lines         177      224      +47     
==========================================
+ Hits          172      218      +46     
- Misses          5        6       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

carlosgjs

Looks great! A few minor changes. Also it looks like it's missing unit tests.

carlosgjs · 2024-01-30T18:16:14Z

src/autora/doc/pipelines/main.py

@@ -47,6 +48,44 @@ def evaluate_documentation(predictions: List[str], references: List[str]) -> Tup
    return (bleu, meteor)


+@app.command(help="Evaluate a model for code-to-documentation generation for all prompts in the prompts_file")
+def eval_on_prompts_file(


The eval-on-prompts-file sounds a little verbose for the CLI. What do you think about something shorter, like eval_prompts?

Yes, it makes sense. Have implemented the change.

src/autora/doc/pipelines/main.py

carlosgjs

Looking good, just one final adjustment. Thx!

carlosgjs · 2024-01-31T19:03:06Z

src/autora/doc/pipelines/main.py

@@ -49,7 +49,7 @@ def evaluate_documentation(predictions: List[str], references: List[str]) -> Tup


 @app.command(help="Evaluate a model for code-to-documentation generation for all prompts in the prompts_file")
-def eval_on_prompts_file(
+def eval_prompts(


I was about to ask you to add a doc-comment for this function. In particular because it's hard to tell what the List[Dict[str,str]] will contain. But I think a better option is to create a type (a dataclass?) for the return type, e.g. an EvalResult class.

carlosgjs · 2024-01-31T19:06:31Z

src/autora/doc/util.py

+def get_eval_result_from_prediction(
+    prediction: Tuple[List[str], float, float], prompt: str
+) -> Dict[str, Any]:
+    eval_result = {


See comment above, would be good to make this strongly typed

anujsinha3 added 2 commits January 30, 2024 03:10

feat: created eval_on_prompts_file() to run and compare multiple prom…

d20c6d9

…pts on single data file input

feat: created eval_on_prompts_file() to run and compare multiple prom…

905ae5f

…pts on single data file input

anujsinha3 requested a review from carlosgjs January 30, 2024 11:29

carlosgjs requested changes Jan 30, 2024

View reviewed changes

anujsinha3 added 3 commits January 31, 2024 03:30

refactor: update function name

588b85f

test: add test for multi-prompts prediction

ec4dbbf

test: add tests for utility function

3a48d6c

carlosgjs approved these changes Jan 31, 2024

View reviewed changes

refactor: change the return type for eval_prompts()

9d5dc0f

anujsinha3 marked this pull request as draft January 31, 2024 23:00

anujsinha3 added 2 commits January 31, 2024 15:27

refactor: use @DataClass annotation

667e77e

refactor: default bleu_score and meteor_Score as None

1f8c14a

anujsinha3 marked this pull request as ready for review February 1, 2024 00:55

anujsinha3 merged commit e7c86f5 into main Feb 1, 2024
9 checks passed

anujsinha3 deleted the feature-method-run-multi-prompts branch February 1, 2024 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature method run multi prompts #33

Feature method run multi prompts #33

anujsinha3 commented Jan 30, 2024 •

edited

Loading

codecov-commenter commented Jan 30, 2024 •

edited

Loading

carlosgjs left a comment

carlosgjs Jan 30, 2024

anujsinha3 Jan 31, 2024 •

edited

Loading

carlosgjs left a comment

carlosgjs Jan 31, 2024

carlosgjs Jan 31, 2024

Feature method run multi prompts #33

Feature method run multi prompts #33

Conversation

anujsinha3 commented Jan 30, 2024 • edited Loading

Change Description

Solution Description

Code Quality

Project-Specific Pull Request Checklists

Bug Fix Checklist

New Feature Checklist

Documentation Change Checklist

Build/CI Change Checklist

Other Change Checklist

codecov-commenter commented Jan 30, 2024 • edited Loading

Codecov Report

carlosgjs left a comment

Choose a reason for hiding this comment

carlosgjs Jan 30, 2024

Choose a reason for hiding this comment

anujsinha3 Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

carlosgjs left a comment

Choose a reason for hiding this comment

carlosgjs Jan 31, 2024

Choose a reason for hiding this comment

carlosgjs Jan 31, 2024

Choose a reason for hiding this comment

anujsinha3 commented Jan 30, 2024 •

edited

Loading

codecov-commenter commented Jan 30, 2024 •

edited

Loading

anujsinha3 Jan 31, 2024 •

edited

Loading