Optimize performance by fuse adding high precision tensor to fp8 tensor #132

tocean · 2023-11-22T10:00:17Z

Description
Optimize performance by fuse add high precision tensor to fp8 tensor

Major Revision

Add an extension msamp_arithmetic
Add fuse kernel for adding high precision tensor to FP8 in extension
Move common header files to msamp/common/include
Add UT
Apply it megatron-FP8DistributedDataParallel

wkcn

Please declare the source and license of msamp/operators/arithmetic/vectorized_pointwise.h, msamp/common/include/concurrency.h and msamp/common/include/poll.h

An example:
Adapted from xxx, xxx license.

tocean · 2023-11-23T07:04:22Z

Please declare the source and license of msamp/operators/arithmetic/vectorized_pointwise.h, msamp/common/include/concurrency.h and msamp/common/include/poll.h

An example: Adapted from xxx, xxx license.

Good suggestion. Fixed.

msamp/common/include/common.h

tests/operators/test_arithmetic.py

tocean · 2023-11-24T02:56:55Z

Verified correctness using GPT3-345m:

baseline:
validation loss at iteration 1000 | lm loss value: 6.059220E+00 | lm loss PPL: 4.280416E+02 |

exp:
validation loss at iteration 1000 | lm loss value: 6.059220E+00 | lm loss PPL: 4.280416E+02 |

tocean added 3 commits November 22, 2023 09:58

optimize performance by fuse add high presicion tensor to fp8 tensor

162bd9a

remove product in common.h

a0c784e

fix typos in Makefile

8e0239d

cp5555 mentioned this pull request Nov 22, 2023

V0.4 Release Plan #123

Open

9 tasks

fix lint issues

4438d62

tocean requested review from wkcn and penghouwen November 22, 2023 10:57

This comment was marked as outdated.

Sign in to view

wkcn reviewed Nov 22, 2023

View reviewed changes

wkcn self-requested a review November 22, 2023 11:40

fix bug and comments

d5d1423

wkcn reviewed Nov 23, 2023

View reviewed changes

msamp/common/include/common.h Outdated Show resolved Hide resolved

tests/operators/test_arithmetic.py Outdated Show resolved Hide resolved

tests/operators/test_arithmetic.py Outdated Show resolved Hide resolved

tocean changed the title ~~Optimize performance by fuse add high precision tensor to fp8 tensor~~ Optimize performance by fuse adding high precision tensor to fp8 tensor Nov 24, 2023

fix comments

e1c8f21

tocean enabled auto-merge (squash) November 24, 2023 03:35

penghouwen approved these changes Nov 24, 2023

View reviewed changes

tocean merged commit 4480ffa into main Nov 24, 2023
9 checks passed

tocean deleted the yuxiang/perf_opt_ext branch November 24, 2023 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize performance by fuse adding high precision tensor to fp8 tensor #132

Optimize performance by fuse adding high precision tensor to fp8 tensor #132

tocean commented Nov 22, 2023 •

edited

Loading

This comment was marked as outdated.

wkcn left a comment •

edited

Loading

tocean commented Nov 23, 2023

tocean commented Nov 24, 2023 •

edited

Loading

Optimize performance by fuse adding high precision tensor to fp8 tensor #132

Optimize performance by fuse adding high precision tensor to fp8 tensor #132

Conversation

tocean commented Nov 22, 2023 • edited Loading

This comment was marked as outdated.

wkcn left a comment • edited Loading

Choose a reason for hiding this comment

tocean commented Nov 23, 2023

tocean commented Nov 24, 2023 • edited Loading

tocean commented Nov 22, 2023 •

edited

Loading

wkcn left a comment •

edited

Loading

tocean commented Nov 24, 2023 •

edited

Loading