[Question] Is there a way to reduce number of files generated during indexing? #1651

shreyas-a-s · 2024-08-23T04:31:47Z

I am deploying JB 1, 2 instances as part of a biological website and we are using AWS S3 as the storage provider.

The issue is that, since generatenames.pl creates a lot of small files, close to 60,000 for some of the genome data, the upload to S3 costs a lot since it counts the number of PUT commands I think.

Also I am seeing very few number of files created by the text-index command by JB2 as part of indexing.

So my question is, whether it is possible to reduce the number of files generated by generatenames.pl or any other method that I can use?

Thanks in advance.

The text was updated successfully, but these errors were encountered:

cmdcolin · 2024-08-23T13:33:03Z

one of the motivations of jbrowse 2 was to avoid the many small files so it is indeed a bit better for this case.

for jbrowse 1 you can try using e.g. --hashBits 4 in generate-names to try to reduce number of files generated but the strategies in jbrowse 1 generally are designed to just make a lot of files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Is there a way to reduce number of files generated during indexing? #1651

[Question] Is there a way to reduce number of files generated during indexing? #1651

shreyas-a-s commented Aug 23, 2024

cmdcolin commented Aug 23, 2024

[Question] Is there a way to reduce number of files generated during indexing? #1651

[Question] Is there a way to reduce number of files generated during indexing? #1651

Comments

shreyas-a-s commented Aug 23, 2024

cmdcolin commented Aug 23, 2024