Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

describegpt: add --prompt-file option (resolves #1085) #1120

Merged
merged 4 commits into from
Jul 11, 2023

Conversation

rzmk
Copy link
Collaborator

@rzmk rzmk commented Jul 10, 2023

🗺 Overview

  • Adds --prompt-file option to describegpt & a doc file at docs/Describegpt.md.
  • In src/cmd/describegpt.rs, refactors get_dictionary_prompt, get_description_prompt, & get_tags_prompt to functions get_prompt_file & get_prompt.

Resolves #1085.

Comment on lines 274 to 281
// Get max_tokens from prompt file if --prompt-file is used
let max_tokens = match args.flag_prompt_file.clone() {
Some(prompt_file) => {
let prompt_file = get_prompt_file(args)?;
prompt_file.tokens
}
None => args.flag_max_tokens,
};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jqnatividad Since --max-tokens has a default of 50, how can we check if the option itself is being used?

I'm trying to make it so that if --max-tokens is explicitly set then it overrides the prompt file tokens value like you mentioned in the issue.

Copy link
Collaborator

@jqnatividad jqnatividad Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rzmk , it's a bit hacky, but this is what I'd do:

pub fn run(argv: &[&str]) -> CliResult<()> {
  let args: Args = util::get_args(USAGE, argv)?;
  // simulate invoking describegpt with just a stdin input, and no options set, so we get the defaults
  let argv_defaults = &["describegpt", "-"];
  let args_defaults: Args = util::get_args(USAGE, argv_defaults)?;

so args_defaults.flag_max_tokens will contain the default value specified in the USAGE text, which you can use later in the code to see if it was changed. This approach gives us the flexibility to insulate ourselves from future changes to the USAGE text as defaults, additional args/options are added.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jqnatividad Getting an error this way:

$ ./target/debug/qsv describegpt NYC_311_SR_2010-2020-sample-1M.csv --tags --max-tokens 100
Invalid arguments.

Usage:
    qsv describegpt [options] [<input>]
    qsv describegpt --help

Also another two ways that may work are:

  • Looping through argv for --max-tokens and getting the subsequent value.
  • Making --max-tokens optional so we can use is_some().

Both methods don't make args_default though which could be useful in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake @rzmk , the first element of argv is the name of the binary:

pub fn run(argv: &[&str]) -> CliResult<()> {
  let args: Args = util::get_args(USAGE, argv)?;
  // simulate invoking describegpt with just a stdin input, and no options set, so we get the defaults
  let argv_defaults = &[argv[0], "describegpt", "-"];
  let args_defaults: Args = util::get_args(USAGE, argv_defaults)?;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still getting the same error @jqnatividad.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hhmmm....

It's working for me:

$ cargo t describegpt -F lite

...

$ target/debug/qsvlite describegpt C:\Users\joeln\Downloads\NYC_311_SR_2010-2020-sample-1M.csv --tags --max-tokens 100
Generating stats from \\?\C:\Users\joeln\Downloads\NYC_311_SR_2010-2020-sample-1M.csv using qsv stats --everything...
Generating frequency from \\?\C:\Users\joeln\Downloads\NYC_311_SR_2010-2020-sample-1M.csv using qsv frequency...
Interacting with OpenAI API...

Generating tags from OpenAI API...
Received tags completion.
uniquekey, createddate, closeddate, agency, agencyname, complainttype, descriptor, locationtype, incidentzip, incidentaddress, streetname, crossstreet1, crossstreet2, intersectionstreet1, intersectionstreet2, addressType, city, landmark, facilitytype, status, duedate, resolutiondescription, resolutionactionupdateddate, communityboard, bbl, borough, xcoordinatestateplane, ycoordinatestateplane, opendatachanneltype, parkfacilityname, park

Copy link
Collaborator Author

@rzmk rzmk Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow it works now for me?! 🎉

Copy link
Collaborator Author

@rzmk rzmk Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jqnatividad What if a user specifies --max-tokens 50 then how can I differentiate between default? This is mainly what I'm considering because a user may have 7000 in their prompt file for tokens but then also set --max-tokens 50. How can I identify if --max-tokens is explicitly set so I can prioritize --max-tokens over the prompt file value if args.flag_max_tokens == args_defaults.flag_max_tokens?

Edit: I'm trying a closure right now.

@rzmk rzmk self-assigned this Jul 10, 2023
Also adds the arg_is_some closure to check if user specified an arg.
@@ -415,6 +469,8 @@ fn run_inference_options(

pub fn run(argv: &[&str]) -> CliResult<()> {
let args: Args = util::get_args(USAGE, argv)?;
// Closure to check if the user gives an argument
let arg_is_some = |arg: &str| -> bool { argv.contains(&arg) };
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a closure that can resolve the prioritization issue.

@rzmk rzmk marked this pull request as ready for review July 10, 2023 23:12
@rzmk rzmk requested a review from jqnatividad July 10, 2023 23:13
@@ -0,0 +1,87 @@
# `describegpt` command
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rzmk ❤️how thorough describegpt's documentation is.

Copy link
Collaborator

@jqnatividad jqnatividad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's collab on it once its merged @rzmk

@jqnatividad jqnatividad merged commit 16ae7c7 into master Jul 11, 2023
@rzmk rzmk deleted the describegpt/prompt-file branch July 14, 2023 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

describegpt: add a --prompt-file option
2 participants