[Feature] Pipeline Parallelization of Different Stages in RLHF #877

llkn-2 · 2024-07-31T10:07:06Z

Motivation

The RLHF process can be divided into three stages: Generation, Forward, and Train. In the Generation stage, responses are generated using the vLLM. During the Forward stage, the actor, critic, reference, and reward models perform inference. In the Train stage, the actor and critic models undergo training.

During the execution of each stage, the GPUs for the other stages remain idle, leading to resource wastage.

To address this issue, we can optimize the process by leveraging the concept of pipeline parallelism. The batch data is divided into multiple smaller micro-batches. After processing a micro-batch in one stage, the data is immediately passed to the next stage for processing, rather than waiting for the entire batch to be completed. This approach reduces the idle time of GPUs in each stage, thereby improving resource utilization.

Modification

The code has been modified based on PR #736. The entrypoint file is xtuner/rlhf/pipeline.py.

hijkzzz · 2024-08-06T02:42:57Z

选项1：引用 OpenRLHF 声明技术版权来源于 OpenRLHF
选项2：我把这个 MR 抄袭的问题再发几篇知乎和小红书并且附带上海 AI Lab 近期另外几起学术造假事件
选项3：关闭 MR

llkn-2 · 2024-08-06T03:43:36Z

@hijkzzz
首先，为这个PR可能给OpenRLHF与Xtuner开发团队带来的误会说声抱歉，这源于我在基于#736 开发时，没有仔细注意双方开发团队之间的纠纷。

第二，这个PR是我作为第三方开发者、出于个人兴趣开发的，与上海AI Lab、Xtuner开发团队并无联系。

第三，这个PR的侧重点在于实现了RLHF不同阶段之间的Pipeline优化，在OpenRLHF中并无实现参考。如果Xtuner开发团队认为基于#736开发是不合适的，那么后续我可以基于新的code base进行优化。

最后，欢迎及时的沟通交流，减少摩擦和误解。

hijkzzz · 2024-08-06T06:32:15Z

@hijkzzz 首先，为这个PR可能给OpenRLHF与Xtuner开发团队带来的误会说声抱歉，这源于我在基于#736 开发时，没有仔细注意双方开发团队之间的纠纷。

第二，这个PR是我作为第三方开发者、出于个人兴趣开发的，与上海AI Lab、Xtuner开发团队并无联系。

第三，这个PR的侧重点在于实现了RLHF不同阶段之间的Pipeline优化，在OpenRLHF中并无实现参考。如果Xtuner开发团队认为基于#736开发是不合适的，那么后续我可以基于新的code base进行优化。

最后，欢迎及时的沟通交流，减少摩擦和误解。

我无法判断你说的信息的真实性（抱歉上次的讨论就 AI Lab 就严重说谎无法让人信服），一个和上海 AI Lab 没关系的开发者基于一个已经 Closed的 MR 去开发 Pipeline 优化。这说起来有点可疑，谁会无缘无故给上海 AI Lab 贡献这么大的 MR 还把 Closed 的 MR 挖出来用？假设就如上述所说是基于个人兴趣，#736 也涉嫌严重的抄袭问题，不适合用这个 MR 打包进入 XTuner，除非严格引用并且在 README.md 中说明基于 Ray 和 vLLM 的 RLHF 技术方案来源于 OpenRLHF。

llkn-2 · 2024-08-06T08:06:53Z

@hijkzzz 首先，为这个PR可能给OpenRLHF与Xtuner开发团队带来的误会说声抱歉，这源于我在基于#736 开发时，没有仔细注意双方开发团队之间的纠纷。
第二，这个PR是我作为第三方开发者、出于个人兴趣开发的，与上海AI Lab、Xtuner开发团队并无联系。
第三，这个PR的侧重点在于实现了RLHF不同阶段之间的Pipeline优化，在OpenRLHF中并无实现参考。如果Xtuner开发团队认为基于#736开发是不合适的，那么后续我可以基于新的code base进行优化。
最后，欢迎及时的沟通交流，减少摩擦和误解。

我无法判断你说的信息的真实性（抱歉上次的讨论就 AI Lab 就严重说谎无法让人信服），一个和上海 AI Lab 没关系的开发者基于一个已经 Closed的 MR 去开发 Pipeline 优化。这说起来有点可疑，谁会无缘无故给上海 AI Lab 贡献这么大的 MR 还把 Closed 的 MR 挖出来用？假设就如上述所说是基于个人兴趣，#736 也涉嫌严重的抄袭问题，不适合用这个 MR 打包进入 XTuner，除非严格引用并且在 README.md 中说明基于 Ray 和 vLLM 的 RLHF 技术方案来源于 OpenRLHF。

因为我需要找一个Code Base来实现Pipeline优化，只不过Code Base刚好找的是 Xtuner，而不是OpenRLHF。因为我更加熟悉Xtuner，开发时也没注意到你们之间的纠纷。如果后边时间精力允许，我也非常乐意去给OpenRLHF贡献代码。
如果Xtuner开发团队认为基于#736开发不合适，可以关闭MR，我能接受。希望Xtuner开发团队能尽快评估和给出结论。
我无意卷入双方开发团队的纠纷，但这个开发账号 @llkn-2 所提交的代码和优化思路并没有涉及抄袭OpenRLHF，恳请@hijkzzz在社媒上不要攻击这个开发账号。

zhaohui and others added 19 commits June 25, 2024 02:22

add rlhf

0e1453a

add/fix pretrain_loss

14ea454

dataloader, resolve comments

72ede3b

rename,remove

6adc32c

save tokenizer

9599838

rm/move models

bb96eef

fix: zero3 for bigger models

5d6fb35

precommit check

795995d

resume

1ff6ab3

fix resume/pretrain_data

e1c8c61

fix tokenizer bug

89bfe9d

async learn

82a3366

add and rename ppo configs

6531b5d

fix rm/sys promt

7f7bab8

fix vllm dp size>1

ea9c67f

fix/add data_map_fn

7d6e1b3

fix: vllm dp&tp size

1a284ff

optimize rlhf performance using pipeline

b35b719

Merge branch 'rlhf-dev' into rlhf-pipe

9d977a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Pipeline Parallelization of Different Stages in RLHF #877

[Feature] Pipeline Parallelization of Different Stages in RLHF #877

llkn-2 commented Jul 31, 2024 •

edited

Loading

hijkzzz commented Aug 6, 2024 •

edited

Loading

llkn-2 commented Aug 6, 2024

hijkzzz commented Aug 6, 2024 •

edited

Loading

llkn-2 commented Aug 6, 2024

[Feature] Pipeline Parallelization of Different Stages in RLHF #877

Are you sure you want to change the base?

[Feature] Pipeline Parallelization of Different Stages in RLHF #877

Conversation

llkn-2 commented Jul 31, 2024 • edited Loading

Motivation

Modification

hijkzzz commented Aug 6, 2024 • edited Loading

llkn-2 commented Aug 6, 2024

hijkzzz commented Aug 6, 2024 • edited Loading

llkn-2 commented Aug 6, 2024

llkn-2 commented Jul 31, 2024 •

edited

Loading

hijkzzz commented Aug 6, 2024 •

edited

Loading

hijkzzz commented Aug 6, 2024 •

edited

Loading