wip: dataloader first draft #17

ssenan · 2022-10-16T07:07:44Z

No description provided.

mateibejan1 · 2022-10-16T13:31:25Z

Could you just change the name of the file to sequence_dataloader.py, so we can differentiate from future possible loaders?

lucapinello · 2022-10-16T14:24:42Z

Thanks guys!

Two comments:

I am fine to have separate files, @mateibejan1 however we can also have different classes on this module and keep all the related classes and functions in a small number of files. I am envisioning to do the same for the metrics e.g. we can have a single file called metrics and we can import from there.
@ssenan I think it may be better to not use the one-hot encoding for the components and using a simple integer so we can keep memory under control when we will load the entire 3.5M sequences!

mateibejan1 · 2022-10-16T15:24:11Z

I understand you perspective @lucapinello. I would only like to apply this separation-by-class to diffusion code, denoising networks and dataloaders. The reasoning is that the code for these types of files tends to get repetitive and once you have 3-4 modules in the same file it becomes tiring to keep track of which section you're debugged or changed. I am ok with keeping all metrics, utils etc. in a small number of files in their designated directories.

SauravMaheshkar · 2022-10-16T17:03:24Z

models/vanilla_diffusion/dataloader.py

+        self.num_workers = num_workers
+
+
+    def prepare_data(self):


We should probably drop this redundant function definition.

SauravMaheshkar · 2022-10-16T17:08:16Z

models/vanilla_diffusion/dataloader.py

+import pytorch_lightning as pl 
+from torch.utils.data import Dataset, DataLoader 
+
+class SequenceDataset(Dataset):


Proposition: We could also separate this script into a dataloader.py and a dataset.py.

The dataset.py would contain the SequenceDataset Class and the dataloader.py can have the SequenceDataModule Class (or a BaseDataModule and further subclasses such as SequenceDataModule as suggested by @lucapinello)

ssenan · 2022-10-16T21:16:40Z

@mateibejan1 Thanks for the suggestion. Can you help me better understand an example file naming breakdown you had in mind in regards to the entire project?

My thought process was more inline with Luca's where all loaders can be kept in a single dataloader.py file, and they can then be called in a main script using something like "from dataloader import SequenceDataModule". This is in line with some other implementations using medical image data that I have seen.

I'm also available on discord @ssenan

IhabBendidi · 2022-10-16T22:54:04Z

There is currently a discussion on folder structure of the code base in #19 , referencing it here so that you guys can check it and add your ideas

wip: dataloader first draf

5029a64

ssenan linked an issue Oct 16, 2022 that may be closed by this pull request

Create a Data Loader Class with Pytorch Lightning #12

Closed

Fixing train, val, and test path

0f499f9

SauravMaheshkar suggested changes Oct 16, 2022

View reviewed changes

rename file, remove one-hot encode

7817725

ssenan merged commit ce9b3bb into pinellolab:codebase Oct 17, 2022

IhabBendidi added the codebase label Oct 17, 2022

This was referenced Oct 17, 2022

Revert "wip: dataloader first draft" #21

Merged

Dataloader Draft #24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: dataloader first draft #17

wip: dataloader first draft #17

ssenan commented Oct 16, 2022

mateibejan1 commented Oct 16, 2022

lucapinello commented Oct 16, 2022

mateibejan1 commented Oct 16, 2022

SauravMaheshkar Oct 16, 2022

SauravMaheshkar Oct 16, 2022

ssenan commented Oct 16, 2022

IhabBendidi commented Oct 16, 2022

wip: dataloader first draft #17

wip: dataloader first draft #17

Conversation

ssenan commented Oct 16, 2022

mateibejan1 commented Oct 16, 2022

lucapinello commented Oct 16, 2022

mateibejan1 commented Oct 16, 2022

SauravMaheshkar Oct 16, 2022

Choose a reason for hiding this comment

SauravMaheshkar Oct 16, 2022

Choose a reason for hiding this comment

ssenan commented Oct 16, 2022

IhabBendidi commented Oct 16, 2022