Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote data on S3 #172

Open
matthdsm opened this issue Jan 29, 2019 · 1 comment
Open

remote data on S3 #172

matthdsm opened this issue Jan 29, 2019 · 1 comment

Comments

@matthdsm
Copy link
Member

matthdsm commented Jan 29, 2019

Hi Brad,

Quick question. The commit history shows "improved support for data on AWS". Could you elaborate a bit on this?

We're looking into decentralizing all of our data to (self-hosted) S3 repo's powered bij minio and CephFS RADOS gateway.

This means all fastq data and all reference data (e.g. the complete genomes dir) are hosted on a S3 url. What's the best way to configure bcbio to leverage this?
How do we configure S3 fastq input and S3 hosted reference data (if possible).

Thanks for the help
Cheers
M

chapmanb added a commit to bcbio/bcbio-nextgen that referenced this issue Jan 30, 2019
Documents work in progress for AWS Batch support with Cromwell. It's not
yet working pending improvements to Cromwell, but documents setup and
current status. bcbio/bcbio-nextgen-vm#172
@chapmanb
Copy link
Member

Matthias;
Thanks for looking into this. This is still work in progress but we're working on supporting CWL runs on AWS Batch using Cromwell. It's not yet functional. but here is the work in progress documentation so you can see what we've got in place:

https://bcbio-nextgen.readthedocs.io/en/latest/contents/cloud.html#amazon-web-services-aws-batch

Practically, it sounds like you don't need AWS batch and would instead just want to build inputs from S3-like buckets and then run them on your own infrastructure. This should work with the current CWL and Cromwell. You'd create an s3: configuration block in your input bcbio_system.yaml as described in the docs and then it should stage down files from there for running on your local cluster and shared filesystem.

I'd definitely welcome feedback and reports if you test this out. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants