Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Is there a way to reduce number of files generated during indexing? #1651

Open
shreyas-a-s opened this issue Aug 23, 2024 · 1 comment

Comments

@shreyas-a-s
Copy link

I am deploying JB 1, 2 instances as part of a biological website and we are using AWS S3 as the storage provider.

The issue is that, since generatenames.pl creates a lot of small files, close to 60,000 for some of the genome data, the upload to S3 costs a lot since it counts the number of PUT commands I think.

Also I am seeing very few number of files created by the text-index command by JB2 as part of indexing.

So my question is, whether it is possible to reduce the number of files generated by generatenames.pl or any other method that I can use?

Thanks in advance.

@cmdcolin
Copy link
Contributor

one of the motivations of jbrowse 2 was to avoid the many small files so it is indeed a bit better for this case.

for jbrowse 1 you can try using e.g. --hashBits 4 in generate-names to try to reduce number of files generated but the strategies in jbrowse 1 generally are designed to just make a lot of files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants