LLMs are statistical alien interns. They can pattern match and reach conclusions but don't tell you how confident they are in their answers. Maybe today they're wrong 2% of the time. In a month, they learn new things and start becoming wrong 5% of the time. Now the planet of Anthropic's interns are better at sentiment analysis.
Concepts like loss calculation and model drift already exist in traditional machine learning settings. The Waffie CLI is built to help make those concepts accessible to software developers leveraging LLM APIs, enabling them to create the best out-of-this-world solutions possible.
We leverage prompt engineering and are inspired by TypeChat, directing the LLM into responding in a machine-readable way.
Configuration is read through a Waffiefile where you can specify test files, API providers, and model versions.
Results are returned so you can compare across multiple models and time.
$ npm install -g waffie
$ waffie COMMAND
running command...
$ waffie (--version)
waffie/0.1.0 darwin-x64 node-v18.9.0
$ waffie --help [COMMAND]
USAGE
$ waffie COMMAND
...
You'll need to set the OPENAI_API_KEY
environment variable with your OpenAI API key.
If you cloned this repo, you can run one of our examples like this:
$ waffie file examples/sentiment-analysis/Waffiefile
{ processedRow: 'positive', expected: 'positive' }
{ processedRow: 'neutral', expected: 'neutral' }
...
{ processedRow: 'negative', expected: 'negative' }
{
file: '/Users/rogerlam/waffie/examples/sentiment-analysis/test/feedback.csv',
count: 22,
passed: 22,
allPassed: true
}
Example Waffiefile:
version: 0.1
actions:
sentiment-analysis:
command: text-completion
providers:
- openai
# - anthropic
prompt: >
You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.
The JSON should be compatible with the TypeScript type Response from the following:
interface Response {
result: "positive" | "negative" | "neutral" };
}
test_directory: test
Example test csv:
Customer Feedback, Sentiment
"I love your product, it's amazing!", positive
"The service was okay, nothing special.", neutral
...
"I feel indifferent about the whole thing.", neutral
Runs automated tests using the provided Waffiefile
USAGE
$ waffie file FILEPATH [-n <value>] [-f]
ARGUMENTS
FILEPATH Path to Waffiefile
FLAGS
-f, --force
-n, --name=<value> name to print
DESCRIPTION
Runs automated tests using the provided Waffiefile
EXAMPLES
$ waffie file
See code: dist/commands/file.ts
Display help for waffie.
USAGE
$ waffie help [COMMANDS] [-n]
ARGUMENTS
COMMANDS Command to show help for.
FLAGS
-n, --nested-commands Include all nested commands in the output.
DESCRIPTION
Display help for waffie.
See code: @oclif/plugin-help
Compare input and expected output
USAGE
$ waffie file WAFFIEFILE
ARGUMENTS
WAFFIEFILE Path to Waffiefile
FLAGS
DESCRIPTION
Compare input and expected output across different providers and models
EXAMPLES
$ waffie file examples/sentiment-analysis/Waffiefile
{
file: '/Users/rogerlam/waffie/examples/sentiment-analysis/test/feedback.csv',
count: 22,
passed: 22,
allPassed: true
}