Dataloader Draft #24

ssenan · 2022-10-17T00:55:40Z

The dataloader is now up to date with all changes regarding one-hot encoding of components and renamed to suit our new folder structure.

See #17 for earlier discussion.

lucapinello · 2022-10-18T22:31:12Z

src/data/sequence_dataloader.py

+from torch.utils.data import Dataset, DataLoader 
+
+class SequenceDatasetBase(Dataset):
+    def __init__(self, data_path, transform=None):


maybe add here sequence_lenght=200 so we have flexibility later

lucapinello · 2022-10-18T22:33:45Z

src/data/sequence_dataloader.py

+        # Iterating through DNA sequences from dataset and one-hot encoding all nucleotides
+        current_seq = self.data["raw_sequence"][index]
+        if 'N' not in current_seq: 
+            X_seq = np.array(self.one_hot_encode(current_seq, ['A','C','T','G'], 200))


here we can replace 200 with self.sequence_length

lucapinello · 2022-10-18T22:35:29Z

src/data/sequence_dataloader.py

+            return X_seq, X_cell_type
+
+    # Function for one hot encoding each line of the sequence dataset
+    def one_hot_encode(self, seq, alphabet, max_seq_len):


replace max_seq_len with sequence_length

ssenan added 4 commits October 17, 2022 11:19

Updating component loading section

5d4bf6f

Merge branch 'codebase' of github.com:ssenan/DNA-Diffusion into codebase

f20c949

sequence dataloader baseline model

2e9ca79

fixing a couple typos

a273de7

ssenan changed the title ~~Codebase~~ Dataloader Draft Oct 17, 2022

IhabBendidi linked an issue Oct 17, 2022 that may be closed by this pull request

Create a Data Loader Class with Pytorch Lightning #12

Closed

IhabBendidi added the codebase label Oct 17, 2022

ssenan requested review from mateibejan1 and lucapinello October 17, 2022 01:26

ssenan added 3 commits October 18, 2022 19:22

wip: separate train/val/test subclasses

c0cd331

Delete codebase/src/data directory

4e50f39

Updated PL dataloader

2ceb3ad

mateibejan1 approved these changes Oct 18, 2022

View reviewed changes

lucapinello reviewed Oct 18, 2022

View reviewed changes

ssenan added 2 commits October 19, 2022 12:48

complete: initial dataloader

9b2e86a

fix: fixed function naming convention

0a30efc

ssenan merged commit 57e9bf5 into pinellolab:codebase Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataloader Draft #24

Dataloader Draft #24

ssenan commented Oct 17, 2022

lucapinello Oct 18, 2022

lucapinello Oct 18, 2022

lucapinello Oct 18, 2022

Dataloader Draft #24

Dataloader Draft #24

Conversation

ssenan commented Oct 17, 2022

lucapinello Oct 18, 2022

Choose a reason for hiding this comment

lucapinello Oct 18, 2022

Choose a reason for hiding this comment

lucapinello Oct 18, 2022

Choose a reason for hiding this comment