Reference server implementation of the htsget API protocol for securely streaming genomic data. For more information about htsget, see the paper or specification.
A GA4GH-hosted instance of this server is running at https://htsget.ga4gh.org/
. To use, see the OpenAPI documentation.
We suggest running the reference server as a docker container, as the image comes pre-installed with all dependencies.
With docker installed, run:
docker image pull ga4gh/htsget-refserver:${TAG}
to pull the image, and:
docker container run -d -p 3000:3000 ga4gh/htsget-refserver:${TAG}
to spin up a containerized server. Custom config files can also be passed to the application by first mounting the directory containing the config, and specifying the path to config in the run command:
docker container run -d -p ${PORT}:${PORT} -v /directory/to/config:/usr/src/app/config ga4gh/htsget-refserver:${TAG} ./htsget-refserver -config /usr/src/app/config/config.json
Additional BAM/CRAM/VCF/BCF directories you wish to serve via htsget can also be mounted into the container. See the Configuration section below for instructions on how to serve custom datasets.
The full list of tags/versions is available on the dockerhub repository page.
To run and/or develop the server natively on your OS, the following dependencies are required:
- Golang and language tools (tested on version 1.13)
- samtools (tested on version 1.9)
- bcftools (tested on version 1.10.2)
- htsget-refserver-utils (1.0.0+)
This project uses Go modules to manage packages and dependencies.
With the above dependencies installed, run:
git clone https://github.com/ga4gh/htsget-refserver.git
cd htsget-refserver
to clone and enter the repository, and:
go build -o ./htsget-refserver ./cmd
to build the application binary. To start, run:
./htsget-refserver
A custom config file can also be specified with -config
:
./htsget-refserver -config /path/to/config.json
The htsget web service can be configured with runtime parameters via a JSON config file, specified with -config
. For example:
./htsget-refserver -config /path/to/config.json
Examples of valid JSON config files are available in this repository:
- ga4gh instance config - used to run the GA4GH-hosted instance at https://htsget.ga4gh.org
- local development config - used to run the local instance for development
- integration tests config - used for integration testing on Travis CI builds
- example 0 config
- empty config
In the JSON file, the root object must have a single "htsgetConfig" property, containing all sub-properties. ie:
{
"htsgetConfig": {}
}
Under the htsgetConfig
property, the props
object overrides application-wide settings. The following table indicates the attributes of props
and what settings they affect.
Name | Description | Default Value |
---|---|---|
port | the port on which the service will run | 3000 |
host | web service hostname. The JSON ticket returned by the server will reference other endpoints, using this hostname/base url to provide a complete url. | http://localhost:3000/ |
docsDir | path to static file directory containing server documentation (e.g. OpenAPI). the server will serve its contents at the /docs/ endpoint |
NONE |
tempDir | writes temporary files used in request processing to this directory | . |
logFile | writes application logs to this file | htsget-refserver.log |
corsAllowedOrigins | CORS allow client from origins. Use comma to separate for multiple origins. | http://localhost |
corsAllowedMethods | CORS allow methods. | GET, POST, PUT, DELETE, OPTIONS |
corsAllowedHeaders | CORS allow headers. | * |
corsAllowCredentials | CORS allow credentials. | false |
corsMaxAge | CORS max age in seconds. | 300 |
awsAssumeRole | Turn on awsAssumeRole middleware. See Private Bucket section below. |
false |
Example props
object:
{
"htsget": {
"props": {
"port": "80",
"host": "https://htsget.ga4gh.org/",
"tempdir": "/tmp/",
"logfile": "/usr/src/app/htsget-refserver.log",
"corsAllowedOrigins": "https://portal.ga4gh.org, http://intranet.ga4gh.org",
}
}
}
Under the htsgetConfig
property, the reads
object overrides settings for reads-related data and endpoints. The following properties can be set:
enabled
(boolean): if true, the server will set up reads-related routes (ie./reads/{id}
,/reads/service-info
). True by default.dataSourceRegistry
(object): allows the server to serve alignment data from multiple cloud or local storage sources by mapping request object id patterns to registered data sources. A singlesources
property contains an array of data sources. For each data source, the following properties are required:pattern
- a regex pattern that theid
in/reads/{id}
is matched against. If anid
matches the pattern, the server will attempt to load data from the specified source. The pattern should make use of named capture group(s) to populate the path to the file.path
- the path template (either by url or local file path) to alignment files matching the pattern. The path must indicate how named capture groups in the pattern will populate the path to the file.
serviceInfo
(object): specify the attribute values returned in the Service Info response from/reads/service-info
. Default attributes are supplied if not provided by config. Allows modification of the following properties from the Service Info specification:id
name
description
organization
contactUrl
documentationUrl
createdAt
updatedAt
environment
version
)
Example reads
object:
{
"htsgetConfig": {
"reads": {
"enabled": true,
"dataSourceRegistry": {
"sources": [
{
"pattern": "^tabulamuris\\.(?P<accession>10X.*)$",
"path": "https://s3.amazonaws.com/czbiohub-tabula-muris/10x_bam_files/{accession}_possorted_genome.bam"
},
{
"pattern": "^tabulamuris\\.(?P<accession>.*)$",
"path": "https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/{accession}.mus.Aligned.out.sorted.bam"
}
]
}
"serviceInfo": {
"id": "demo.reads",
"name": "htsget demo reads",
"description": "serve alignment data via htsget",
"organization": {
"name": "Example Org",
"url": "https://exampleorg.com"
},
"contactUrl": "mailto:[email protected]",
"documentationUrl": "https://htsget.exampleorg.com/docs",
"createdAt": "2021-01-01T09:00:00Z",
"updatedAt": "2021-01-01T09:00:00Z",
"environment": "test",
"version": "1.0.0"
}
}
}
}
Under the htsgetConfig
property, the variants
object overrides settings for variants-related data and endpoints. The following properties can be set:
enabled
(boolean): if true, the server will set up variants-related routes (ie./variants/{id}
,/variants/service-info
). True by default.dataSourceRegistry
(object): allows the server to serve variant data from multiple cloud or local storage sources by mapping request object id patterns to registered data sources. A singlesources
property contains an array of data sources. For each data source, the following properties are required:pattern
- a regex pattern that theid
in/variants/{id}
is matched against. If anid
matches the pattern, the server will attempt to load data from the specified source. The pattern should make use of named capture group(s) to populate the path to the file.path
- the path template (either by url or local file path) to variant files matching the pattern. The path must indicate how named capture groups in the pattern will populate the path to the file.
serviceInfo
(object): specify the attribute values returned in the Service Info response from/variants/service-info
. Default attributes are supplied if not provided by config. Allows modification of the following properties from the Service Info specification:id
name
description
organization
contactUrl
documentationUrl
createdAt
updatedAt
environment
version
)
Example variants
object:
{
"htsgetConfig": {
"variants": {
"enabled": true,
"dataSourceRegistry": {
"sources": [
{
"pattern": "^1000genomes\\.(?P<accession>.*)$",
"path": "https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase1/analysis_results/integrated_call_sets/{accession}.vcf.gz"
}
]
}
"serviceInfo": {
"id": "demo.variants",
"name": "htsget demo variants",
"description": "serve variant data via htsget",
"organization": {
"name": "Example Org",
"url": "https://exampleorg.com"
},
"contactUrl": "mailto:[email protected]",
"documentationUrl": "https://htsget.exampleorg.com/docs",
"createdAt": "2021-01-01T09:00:00Z",
"updatedAt": "2021-01-01T09:00:00Z",
"environment": "test",
"version": "1.0.0"
}
}
}
}
-
Turn on
awsAssumeRole
middleware request interceptor to support AWS Assume Role temporary security credentials loading to access S3 private bucket. -
When it is not configured, the default
awsAssumeRole
set tofalse
such that execution environment know how to access S3 private bucket through AWS standard mechanism. In that case, see htslib AWS S3 plugin for credentials loading requirement.
Say, you have data in private bucket as follows:
s3://my-primary-data-prod/Project/PID00115/WGS/PID00115-final.bam
Example configuration:
{
"htsgetConfig": {
"props": {
...
"awsAssumeRole": true
},
"reads": {
...
"dataSourceRegistry": {
"sources": [
...
{
"pattern": "^my-primary-data(?P<accession>.*)$",
"path": "s3://my-primary-data{accession}"
}
]
},
"serviceInfo": {
...
}
},
"variants": {
...
"dataSourceRegistry": {
"sources": [
...
{
"pattern": "^my-primary-data(?P<accession>.*)$",
"path": "s3://my-primary-data{accession}"
}
]
},
"serviceInfo": {
...
}
}
}
}
Then you can call Htsget as follows:
curl -s http://localhost:3000/reads/my-primary-data-prod/Project/PID00115/WGS/PID00115-final.bam | jq
To execute unit and end-to-end tests on the entire package, run:
go test ./... -coverprofile=cp.out
The go coverage report will be available at ./cp.out
. To execute tests for a specific package (for example the htsrequest
package) run:
go test ./internal/htsrequest -coverprofile=cp.out
v1.5.0
- Supports configurable CORS headers
v1.4.0
- Server supports experimental
POST
method for endpoint/reads/{id}
. Multiple genomic regions can be requested in a single request. See hts-specs PR #285 for more info.
v1.3.0
- Server supports reads and/or variants
service-info
endpoints. The attributes of theservice-info
response can be specified via the config file independently for each datatype
v1.2.0
- Server supports htsget
/variants/{id}
endpoint, streams VCFs via htsget protocol using bcftools dependency
v1.1.0
- Added support for configurable data sources via a data source registry specified in config file
- server can stream reads data via htsget protocol from any url or local file specified via config
v1.0.0
- Initial release
- Implement
POST
request functionality
- Jeremy Adams (jb-adams) [email protected]
- David Liu (xngln)
Bugs and issues can be submitted via the Github Issue Tracker