The grid-computing-tools repo is intended to be a place for scripts and recipes for solving some very common issues, which typically fall under the category of "simple for a few files, hard for many files." Examples include:
- I have many VCFs in Cloud Storage that I need to (de)compress
- I have many VCFs in Cloud Storage that have something wrong with the header
- I have many BAMs in Cloud Storage for which I need to compute index files
The primary components of the grid-computing-tools examples are:
- Google Cloud Storage - location of source input files and destination for output files
- Google Compute Engine - virtual machines in the cloud
- Grid Engine - job scheduling software to distribute commands across a cluster of virtual machines
The approach here is intended to provide a familiar environment to computational scientists who are accustomed to using Grid Engine to submit jobs to fixed-size clusters available at their research institution.
Documentation for the tools in this repo can be found at http://googlegenomics.readthedocs.org/
The following tools are available: