TypeError for Mixtral-8x7B-v0.1: unsupported format string passed to NoneType.format #1267

mpatel31415 · 2024-10-07T16:12:48Z

🐛 Bug

When running the benchmarks for Mixtral-8x7B-v0.1 for Eager mode we get error:

0: [rank0]: File "/workspace/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py", line 887, in benchmark_main
0: [rank0]: print(f"Tokens/s: {benchmark.perf_metrics['tokens_per_sec']:.02f}")
0: [rank0]: TypeError: unsupported format string passed to NoneType.format

I see in the log that there was a message:

Model Flops/Throughput calculation failed for model Mixtral-8x7B-v0.1. Skipping throughput metric collection.

It might be caused by the fact that in this code in benchmark_litgpt.py:

    try:
        # Calculate the model FLOPs
        self.calculate_model_flops()
        # Setup throughput Collection
        self.throughput = Throughput(window_size=self.max_iters - self.warmup_iters, world_size=world_size)
    except:
        self.throughput = None
        print(
            f"Model Flops/Throughput calculation failed for model {self.model_name}. Skipping throughput metric collection."
        )

we have both self.calculate_model_flops() and throughput in try catch block. I'd put there only calculate_model_flops() but maybe there were some problems in getting Throughput and I'm just not aware of them.

Another possible fix is to check if tokens_per_sec is present in the dictionary before accessing it.

To Reproduce

Please use:

8 node(s), each with 8 GPUs.
Image "INTERNAL_IMAGE:pjnl-20241001"

Training script:
python /opt/pytorch/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py
--model_name Mixtral-8x7B-v0.1
--distributed_mode fsdp
--shard_mode zero3
--compile eager
--checkpoint_activations True
--low_precision_mode none
--micro_batch_size 1

Expected behavior

We should be able to run the benchmarking script, even if we are not able print a few metrics.

Environment

system.device_product_name DGXH100
system.gpu_driver_version 535.129.03
libraries.cuda 12.6.2.004
libraries.pip.lightning 2.4.0.dev20240728
libraries.pip.lightning-thunder 0.2.0.dev0
libraries.pip.lightning-utilities 0.11.7
libraries.pip.litgpt 0.4.11
libraries.pip.nvfuser 0.2.13+git4cbd7a4
libraries.pip.pytorch-lightning 2.4.0
libraries.pip.torch 2.6.0a0+gitd6d9183
libraries.pip.torchmetrics 1.4.2
libraries.pip.torchvision 0.19.0a0+d23a6e1

The text was updated successfully, but these errors were encountered:

tfogal · 2024-10-11T16:11:52Z

Hey @eqy this seems to be an eager mode bug, not related to thunder at all.
Could you / group take a look at this?

mpatel31415 · 2024-10-14T13:18:25Z

Actually it's related to benchmark_litgpt.py script. I know one possible fix for it, so I can prepare PR around Wednesday, but it won't solve missing results from calculate_model_flops function.

tfogal added the mixology Issues that the mixology team has surfaced label Oct 11, 2024

mpatel31415 linked a pull request Oct 24, 2024 that will close this issue

Mpatel31415/fix for missing tokens per sec #1347

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError for Mixtral-8x7B-v0.1: unsupported format string passed to NoneType.format #1267

TypeError for Mixtral-8x7B-v0.1: unsupported format string passed to NoneType.format #1267

mpatel31415 commented Oct 7, 2024 •

edited

Loading

tfogal commented Oct 11, 2024

mpatel31415 commented Oct 14, 2024 •

edited

Loading

TypeError for Mixtral-8x7B-v0.1: unsupported format string passed to NoneType.__format__ #1267

TypeError for Mixtral-8x7B-v0.1: unsupported format string passed to NoneType.__format__ #1267

Comments

mpatel31415 commented Oct 7, 2024 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

tfogal commented Oct 11, 2024

mpatel31415 commented Oct 14, 2024 • edited Loading

TypeError for Mixtral-8x7B-v0.1: unsupported format string passed to NoneType.format #1267

TypeError for Mixtral-8x7B-v0.1: unsupported format string passed to NoneType.format #1267

mpatel31415 commented Oct 7, 2024 •

edited

Loading

mpatel31415 commented Oct 14, 2024 •

edited

Loading