Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rectify Missing Dataloader Preparation Call in PaddingFree Plugin Method #63

Merged

Conversation

achew010
Copy link
Contributor

@achew010 achew010 commented Aug 6, 2024

Description

This PR fixes a missing dataloader preparation call in the PaddingFree Plugin that will cause num optimization steps in distributed experiments to be computed wrongly.

With this fix, the relative performance improvement between padded vs padding-free approximately matches for

  • FMS-Acceleration PaddingFree plugin, 24%
  • Natively supported padding free in Transformers, 26%

ORCA Math 20K Subset - 1 Epoch

Transformers

Model DataProcess Grad Accum Per Device Batch Size Num Devices Time Throughput (token/s) / device
Mistral-7B Padded 2 4 8 812 1216
Mistral-7B Transformers Native PaddingFree 2 4 8 596 1658

FMS-Acceleration Padding-Free Plugin

Model DataProcess Grad Accum Per Device Batch Size Num Devices Time Throughput (token/s) / device
Mistral-7B Padded 2 4 8 818 1208
Mistral-7B PaddingFree Plugin 2 4 8 621 1568

@achew010 achew010 requested a review from fabianlim as a code owner August 6, 2024 09:26
@achew010 achew010 force-pushed the fix-padding-free-collator branch from 9d1cb39 to 8a3e8bf Compare August 6, 2024 09:33
@fabianlim
Copy link
Contributor

fabianlim commented Aug 6, 2024

Whats the difference between "Padded-Longest" and "Transformers Padded"? If there is no difference please edit the description to make it consistent

@fabianlim fabianlim merged commit b159600 into foundation-model-stack:main Aug 6, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants