Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I generate the training datasets? #14

Open
llfzllfz opened this issue Sep 19, 2022 · 1 comment
Open

How can I generate the training datasets? #14

llfzllfz opened this issue Sep 19, 2022 · 1 comment

Comments

@llfzllfz
Copy link

I've download the RNAStralign from the mxfold2, and it has 8 subfolders. With your code in process_data_newdataset.py, I just find the os.listdir(), and it can't solve the subfolders. So what should I do to generate the training datasets?
Thanks.

@sperfu
Copy link
Contributor

sperfu commented Sep 21, 2022

Hi there,

It depends on how you would like to deal with these data. In our work, we merged all these files in the RNAStralign dataset into one folder and use all the dataset for training. If you choose to check the performance on various species, you may need to use these separated subfolders as illustrated in e2efold paper. So all in all, it depends on how you would like to operate.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants