-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature method run multi prompts #33
Conversation
…pts on single data file input
…pts on single data file input
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #33 +/- ##
==========================================
+ Coverage 97.17% 97.32% +0.14%
==========================================
Files 3 5 +2
Lines 177 224 +47
==========================================
+ Hits 172 218 +46
- Misses 5 6 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! A few minor changes. Also it looks like it's missing unit tests.
src/autora/doc/pipelines/main.py
Outdated
@@ -47,6 +48,44 @@ def evaluate_documentation(predictions: List[str], references: List[str]) -> Tup | |||
return (bleu, meteor) | |||
|
|||
|
|||
@app.command(help="Evaluate a model for code-to-documentation generation for all prompts in the prompts_file") | |||
def eval_on_prompts_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The eval-on-prompts-file
sounds a little verbose for the CLI. What do you think about something shorter, like eval_prompts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it makes sense. Have implemented the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just one final adjustment. Thx!
@@ -49,7 +49,7 @@ def evaluate_documentation(predictions: List[str], references: List[str]) -> Tup | |||
|
|||
|
|||
@app.command(help="Evaluate a model for code-to-documentation generation for all prompts in the prompts_file") | |||
def eval_on_prompts_file( | |||
def eval_prompts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to ask you to add a doc-comment for this function. In particular because it's hard to tell what the List[Dict[str,str]]
will contain. But I think a better option is to create a type (a dataclass?) for the return type, e.g. an EvalResult
class.
src/autora/doc/util.py
Outdated
def get_eval_result_from_prediction( | ||
prediction: Tuple[List[str], float, float], prompt: str | ||
) -> Dict[str, Any]: | ||
eval_result = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above, would be good to make this strongly typed
Change Description
exp: create a method [similar to eval()] to run and compare multiple prompts on single data file input. #31
Solution Description
created a command-line typer method named eval_on_prompts_file() that:
command line syntax example:
autodoc eval-on-prompts-file data/autora/data.jsonl data/autora/prompts/all_prompt.json
TODO: add tests
Code Quality
Project-Specific Pull Request Checklists
Bug Fix Checklist
New Feature Checklist
Documentation Change Checklist
Build/CI Change Checklist
Other Change Checklist