-
Notifications
You must be signed in to change notification settings - Fork 10
Benchmarking - Add tensor_parallel_size arg for multi-gpu benchmarking #66
Benchmarking - Add tensor_parallel_size arg for multi-gpu benchmarking #66
Conversation
@@ -20,7 +20,7 @@ | |||
"output-len": [ | |||
128 | |||
], | |||
"tensor-parallel-size": [ | |||
"tensor-parallel-size_": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the underscore? I don't see it for other arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark_throughput.py
script is not meant to use this argument directly.
it is part of a mutually exclusive arg group,
tp_group = parser.add_mutually_exclusive_group(required=True)
tp_group.add_argument("--tensor-parallel-size_", type=int, default=None)
tp_group.add_argument("--use-all-available-gpus_", action="store_true")
and the script derives the correct tensor_parallel_size using the utility,
def get_tensor_parallel_size(args: argparse.Namespace) -> int:
tensor_parallel_size = num_available_gpus() \
if args.use_all_available_gpus_ else args.tensor_parallel_size_
assert tensor_parallel_size > 0 and \
tensor_parallel_size <= num_available_gpus()
return tensor_parallel_size
@@ -8,6 +8,7 @@ | |||
"mistralai/Mistral-7B-Instruct-v0.2", | |||
"NousResearch/Llama-2-7b-chat-hf" | |||
], | |||
"use_all_available_gpus" : "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually translates to use_all_available_gpus = True
in python.. If we want to set it to false, we remove this key.
This is really a pain - I am considering moving to yml or some other format. on top this, JSON doesn't even let you add comments :(
@@ -27,36 +28,6 @@ | |||
"sharegpt" | |||
] | |||
} | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed redundant benchmark.
@@ -34,7 +31,8 @@ | |||
], | |||
"dtype": [ | |||
"auto" | |||
] | |||
], | |||
"use-all-available-gpus_" : [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above - an empty use-all-available-gpus_
translates to --use-all-available-gpus_
arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks reasonable and nice job with the multi-gpu usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
SUMMARY: Add a benchmarking workflow to run Nightly TEST PLAN: Manual benchmark triggers on AWS instances - check #66 --------- Co-authored-by: Varun Sundar Rabindranath <[email protected]>
SUMMARY:
TEST PLAN:
Run manual benchmark jobs
multi-gpu benchmark : https://github.com/neuralmagic/nm-vllm/actions/runs/8086009234/job/22094787943
single-gpu benchmark : https://github.com/neuralmagic/nm-vllm/actions/runs/8086016169/job/22094812742
(Then benchmarks didn't run to completion as huggingface went down mid-way. but the artifacts seem reasonable for what it did run)