Use pytorch data loader for training #15

smguo · 2021-02-02T03:40:57Z

Dynamorph currently loads all training data to memory at once, and samples data for each mini-batch with a single process. Training could potentially be sped up using pytorch data loader , which supports multiprocessing and data augmentation.

One issue with adopting the data loader is the current matching loss implementation requires the batch to be sampled in certain order. This could possibly be achieved using Iterable-style datasets.

bryantChhun · 2021-02-03T19:57:53Z

@smguo Without an established CLI for VQ-VAE training ( of dynamorph data ) it's a little hard to insert a data loader. There are a couple candidate locations I could try. Let me know what you think:

Here in Michael's VQ-VAE training
or here in your CM training

In the meantime, I will try a generalized loader and test it against your CM data.

smguo · 2021-02-03T20:50:01Z

@bryantChhun Yes I agree on the point of the data loader should be built on the training CLI. I believe the version in master branch is outdated. We should merge @miaecle's current version with mine before working on data loader.

The dataloader loads the dataset (image files, not pickle files) from hard drive on the fly during training, so the whole loading dataset block and data structure would need to be re-written:
https://github.com/czbiohub/dynamorph/blob/6269c55b95834603070fc139d71d615e2656fb51/run_training.py#L1281-L1309

And also the train function. One tricky part is to make matching loss work with data loader as I mentioned in the last post.

mattersoflight mentioned this issue Feb 3, 2021

support for 5D images and changing the number of channels #14

Closed

mattersoflight mentioned this issue Mar 11, 2021

Train data split & Early stopping #22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pytorch data loader for training #15

Use pytorch data loader for training #15

smguo commented Feb 2, 2021

bryantChhun commented Feb 3, 2021

smguo commented Feb 3, 2021 •

edited

Loading

Use pytorch data loader for training #15

Use pytorch data loader for training #15

Comments

smguo commented Feb 2, 2021

bryantChhun commented Feb 3, 2021

smguo commented Feb 3, 2021 • edited Loading

smguo commented Feb 3, 2021 •

edited

Loading