You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Quick question. The commit history shows "improved support for data on AWS". Could you elaborate a bit on this?
We're looking into decentralizing all of our data to (self-hosted) S3 repo's powered bij minio and CephFS RADOS gateway.
This means all fastq data and all reference data (e.g. the complete genomes dir) are hosted on a S3 url. What's the best way to configure bcbio to leverage this?
How do we configure S3 fastq input and S3 hosted reference data (if possible).
Thanks for the help
Cheers
M
The text was updated successfully, but these errors were encountered:
chapmanb
added a commit
to bcbio/bcbio-nextgen
that referenced
this issue
Jan 30, 2019
Documents work in progress for AWS Batch support with Cromwell. It's not
yet working pending improvements to Cromwell, but documents setup and
current status. bcbio/bcbio-nextgen-vm#172
Matthias;
Thanks for looking into this. This is still work in progress but we're working on supporting CWL runs on AWS Batch using Cromwell. It's not yet functional. but here is the work in progress documentation so you can see what we've got in place:
Practically, it sounds like you don't need AWS batch and would instead just want to build inputs from S3-like buckets and then run them on your own infrastructure. This should work with the current CWL and Cromwell. You'd create an s3: configuration block in your input bcbio_system.yaml as described in the docs and then it should stage down files from there for running on your local cluster and shared filesystem.
I'd definitely welcome feedback and reports if you test this out. Thanks again.
Hi Brad,
Quick question. The commit history shows "improved support for data on AWS". Could you elaborate a bit on this?
We're looking into decentralizing all of our data to (self-hosted) S3 repo's powered bij minio and CephFS RADOS gateway.
This means all fastq data and all reference data (e.g. the complete
genomes
dir) are hosted on a S3 url. What's the best way to configure bcbio to leverage this?How do we configure S3 fastq input and S3 hosted reference data (if possible).
Thanks for the help
Cheers
M
The text was updated successfully, but these errors were encountered: