Make sure you have logged into Huggingface
huggingface-cli login
Set environment variables for benchmarking
export BASE_URL=<BentoCloud Service URL>
export SYSTEM_PROMPT=1 // 1 or 0
python benchmark.py --max_users 10 --session_time 300 --ping_correction
max_users
is the max number of concurrent users to spawnsession_time
is the duration of the benchmark sesssion, in secondsping_correction
is a flag that determines whether ping latency should be deducted from the metrics