Use model-cast-to-bfloat16 rather than AMP-to-bfloat16 for inference.#9198

Merged

titu1994 merged 2 commits intoNVIDIA:mainfrom galv:dgalvez/fix-autocast-slowness-2

Jun 6, 2024

+120-46

Commits on Jun 5, 2024