You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Transcriptomics data on the NeMO archive are often stored as ascii text files (fastq, fasta, mex) that are sometimes tarballed, and sometimes gzipped. I have also found tarballed BAM files (binary).
Some of these data files can be very large, and a user may want to access only particular elements of the data file without having to download the entire file. I wonder if we can use LINDI to create an efficient JSON index of specific data elements within a NeMO-hosted dataset for streaming and local access. Just an idea right now as we brainstorm for the grant proposal.
BDBags can be used to index and download particular files of a dataset but I don't know if this works within a tarball or within a FASTQ file.
The text was updated successfully, but these errors were encountered:
Transcriptomics data on the NeMO archive are often stored as ascii text files (fastq, fasta, mex) that are sometimes tarballed, and sometimes gzipped. I have also found tarballed BAM files (binary).
You can index the files in a tarball with byte ranges using the tarball header. And supposedly you can also index gzipped files and decompress byte ranges of those as well.
Example BICCN data:
https://data.nemoarchive.org/biccn/grant/u01_lein/lein/transcriptome/sncell/10x_v3/
https://data.nemoarchive.org/biccn/grant/u01_lein/linnarsson/transcriptome/sncell/10x_v2/human/processed/CellRanger5/
https://data.nemoarchive.org/biccn/grant/u19_huang/arlotta/transcriptome/sncell/10x_v2/mouse/processed/align/
Some of these data files can be very large, and a user may want to access only particular elements of the data file without having to download the entire file. I wonder if we can use LINDI to create an efficient JSON index of specific data elements within a NeMO-hosted dataset for streaming and local access. Just an idea right now as we brainstorm for the grant proposal.
BDBags can be used to index and download particular files of a dataset but I don't know if this works within a tarball or within a FASTQ file.
The text was updated successfully, but these errors were encountered: