Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Benchmarking - Add tensor_parallel_size arg for multi-gpu benchmarking #66

Merged

Conversation

varun-sundar-rabindranath
Copy link

@varun-sundar-rabindranath varun-sundar-rabindranath commented Feb 28, 2024

SUMMARY:

  • Add tensor_parallel_size arg for multi-gpu benchmarking.
  • Update the benchmark config files to use all available gpus by default.
  • Other minor changes :
    • Fix script args outer-product enumeration.
    • Limit dataset sample-space when using sharegpt dataset.
    • Remove redundant benchmarks.
    • Standardize benchmarking result JSON.

TEST PLAN:
Run manual benchmark jobs
multi-gpu benchmark : https://github.com/neuralmagic/nm-vllm/actions/runs/8086009234/job/22094787943
single-gpu benchmark : https://github.com/neuralmagic/nm-vllm/actions/runs/8086016169/job/22094812742
(Then benchmarks didn't run to completion as huggingface went down mid-way. but the artifacts seem reasonable for what it did run)

@@ -20,7 +20,7 @@
"output-len": [
128
],
"tensor-parallel-size": [
"tensor-parallel-size_": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the underscore? I don't see it for other arguments

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark_throughput.py script is not meant to use this argument directly.
it is part of a mutually exclusive arg group,

    tp_group = parser.add_mutually_exclusive_group(required=True)
    tp_group.add_argument("--tensor-parallel-size_", type=int, default=None)
    tp_group.add_argument("--use-all-available-gpus_", action="store_true")

and the script derives the correct tensor_parallel_size using the utility,

def get_tensor_parallel_size(args: argparse.Namespace) -> int:
    tensor_parallel_size = num_available_gpus() \
        if args.use_all_available_gpus_ else args.tensor_parallel_size_
    assert tensor_parallel_size > 0 and \
           tensor_parallel_size <= num_available_gpus()
    return tensor_parallel_size

neuralmagic/benchmarks/datasets_registry.py Show resolved Hide resolved
@varun-sundar-rabindranath varun-sundar-rabindranath marked this pull request as draft February 28, 2024 15:24
@@ -8,6 +8,7 @@
"mistralai/Mistral-7B-Instruct-v0.2",
"NousResearch/Llama-2-7b-chat-hf"
],
"use_all_available_gpus" : "",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually translates to use_all_available_gpus = True in python.. If we want to set it to false, we remove this key.

This is really a pain - I am considering moving to yml or some other format. on top this, JSON doesn't even let you add comments :(

@@ -27,36 +28,6 @@
"sharegpt"
]
}
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed redundant benchmark.

@@ -34,7 +31,8 @@
],
"dtype": [
"auto"
]
],
"use-all-available-gpus_" : []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above - an empty use-all-available-gpus_ translates to --use-all-available-gpus_ arg

@varun-sundar-rabindranath varun-sundar-rabindranath marked this pull request as ready for review February 28, 2024 19:04
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable and nice job with the multi-gpu usage

Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

@varun-sundar-rabindranath varun-sundar-rabindranath merged commit 0367fc2 into main Feb 28, 2024
2 of 4 checks passed
@varun-sundar-rabindranath varun-sundar-rabindranath deleted the varun/benchmark-add-tensor-parallel-sizes-arg branch February 28, 2024 21:53
varun-sundar-rabindranath added a commit that referenced this pull request Feb 28, 2024
SUMMARY:
Add a benchmarking workflow to run Nightly

TEST PLAN:
Manual benchmark triggers on AWS instances - check
#66

---------

Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants