Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage usage of PEMA #65

Open
savvas-paragkamian opened this issue Dec 25, 2023 · 1 comment
Open

Storage usage of PEMA #65

savvas-paragkamian opened this issue Dec 25, 2023 · 1 comment

Comments

@savvas-paragkamian
Copy link
Contributor

This is more of a question of how PEMA uses storage for each run. For my project I have 140 samples with PE sequences resulting to 14 gb of data.

14G ./my data
196G /pema215_otu

Is possible to reduce the storage needed for a run of PEMA or all output is required?

For example I have 2 all_samples.fasta (one in mainOutput and one in PEMA folder) files and 1 final_all_samples.fasta, are all necessary?

Also some intermediate folders like linearizedSequences, mergedSequences take up similar space as the mydata folder.

The reason for this issue is that in large scale projects this can lead to exceeding disk quota.

@hariszaf
Copy link
Owner

Hi @savvas-paragkamian. Thanks for the points.

The all_samples.fasta should be removed from the top output folder.

In general, a feature could be added so files that are not being used from a step and afterwards could be removed on the fly.

At the moment pema returns everything so the user can validate the filtering parameters and their affect.

However, it might be a good option to remove intermediate files optionally for such cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants