Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataloader Draft #24

Merged
merged 9 commits into from
Oct 19, 2022
Merged

Dataloader Draft #24

merged 9 commits into from
Oct 19, 2022

Conversation

ssenan
Copy link
Collaborator

@ssenan ssenan commented Oct 17, 2022

The dataloader is now up to date with all changes regarding one-hot encoding of components and renamed to suit our new folder structure.

See #17 for earlier discussion.

@ssenan ssenan changed the title Codebase Dataloader Draft Oct 17, 2022
@IhabBendidi IhabBendidi linked an issue Oct 17, 2022 that may be closed by this pull request
from torch.utils.data import Dataset, DataLoader

class SequenceDatasetBase(Dataset):
def __init__(self, data_path, transform=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add here sequence_lenght=200 so we have flexibility later

# Iterating through DNA sequences from dataset and one-hot encoding all nucleotides
current_seq = self.data["raw_sequence"][index]
if 'N' not in current_seq:
X_seq = np.array(self.one_hot_encode(current_seq, ['A','C','T','G'], 200))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we can replace 200 with self.sequence_length

return X_seq, X_cell_type

# Function for one hot encoding each line of the sequence dataset
def one_hot_encode(self, seq, alphabet, max_seq_len):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace max_seq_len with sequence_length

@ssenan ssenan merged commit 57e9bf5 into pinellolab:codebase Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Create a Data Loader Class with Pytorch Lightning
4 participants