RFC-0001-economic-dataloader.md #69

yoadbs · 2024-09-27T15:15:30Z

A new dataloader multiprocessing pipeline design is suggested. This pipeline splits the task of batch generation, into 2 types of workers: item generating workers, and batch generating workers. This pipeline is designated to significantly reduce random-access-memory (RAM) usage, without any significant reduction in throughput.

facebook-github-bot · 2024-09-27T15:15:39Z

Hi @yoadbs!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

albanD · 2024-09-27T17:13:02Z

cc @andrewkho

andrewkho · 2024-09-30T22:30:20Z

Hi @yoadbs thank you for this thoughtful RFC! I just had a quick look but this looks like it would be covered by some of our plans in torchdata to allow more modular parallelism: https://github.com/pytorch/data/issues/1318 . I know it's long but I believe it should cover your use case as well, please let me know if it doesn't.

Some thoughts on this in general, these will be true for both your RFC and the one in pytorchd/data's #1318:

this is going to be more relevant for large batch_size and larger data, eg HD video

Introducing more IPC between worker pools might slow things down
requires tuning multiple worker pools/prefetch buffers

Yoad Bar-Shean added 30 commits September 14, 2024 20:25

add dataloader-echonomic

5930560

aa

8907692

aa

52711e4

aa

50ad219

aa

669e558

aa

8410ac4

aa

ad91f49

aa

27f832d

aa

8630a0a

aa

12609bc

aa

4aa041b

aa

b81aaaa

aa

701d622

aa

aaed9cc

aa

2542183

aa

7e6974a

aa

ce2ed98

aa

7a10f00

aa

6f4f3e8

aa

67d3e6a

aa

efc757d

aa

dce9bb9

aa

a67e58e

aa

0352695

aa

907f5e4

aa

693c4d4

aa

f59538c

aa

81455e4

aa

efe328e

aa

f1c86c4

Yoad Bar-Shean added 27 commits September 27, 2024 09:43

aa

019df38

aa

f7817ef

aa

dfd4006

aa

fae3a3b

aa

2893c9d

aa

0854707

aa

139a983

aa

676895f

aa

46ca604

aa

ad9d88f

aa

4e824af

aa

2e2775c

aa

f9148e6

aa

3c276c5

aa

17d265b

aa

b0e6538

aa

c20a4e6

aa

67a7aa8

aa

b5d7d14

aa

53a6086

aa

39c1f50

aa

ddfc8f0

aa

75fd80e

aa

2422d7e

aa

4bedb9a

aa

38717ac

aa

c7b25aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC-0001-economic-dataloader.md #69

RFC-0001-economic-dataloader.md #69

yoadbs commented Sep 27, 2024

facebook-github-bot commented Sep 27, 2024

albanD commented Sep 27, 2024

andrewkho commented Sep 30, 2024

RFC-0001-economic-dataloader.md #69

Are you sure you want to change the base?

RFC-0001-economic-dataloader.md #69

Conversation

yoadbs commented Sep 27, 2024

facebook-github-bot commented Sep 27, 2024

Action Required

Process

albanD commented Sep 27, 2024

andrewkho commented Sep 30, 2024