Skip to content

Commit

Permalink
Handle local datasets
Browse files Browse the repository at this point in the history
  • Loading branch information
kzollove committed Oct 21, 2024
1 parent 9376e69 commit 3fb5058
Show file tree
Hide file tree
Showing 6 changed files with 35 additions and 8 deletions.
6 changes: 3 additions & 3 deletions R/stageData.R
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ getStaged <- function(rec, storageConfig = readStorageConfig()) {
# ONLY HANDLES FILES (NO API YET)
# TODO there has to be a different way to change timeout without changing options

isPersisted <- storageConfig$`offline-storage`$`persist-data`
storageDir <- file.path(storageConfig$`offline-storage`$directory, rec$dataset_name)
isPersisted <- storageConfig$offline_storage$persist_data
storageDir <- file.path(storageConfig$offline_storage$directory, rec$dataset_name)
gisTempDir <- file.path(tempdir(), 'gaia')

if (!dir.exists(gisTempDir)) {
Expand Down Expand Up @@ -135,7 +135,7 @@ getStaged <- function(rec, storageConfig = readStorageConfig()) {
#'

readStorageConfig <- function() {
yaml::read_yaml(system.file('config.yml', package = 'gaiaCore'))
yaml::read_yaml(system.file('config/storage.yml', package = 'gaiaCore'))
}


Expand Down
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,17 @@ docker run -itd --rm -e USER="ohdsi" -e PASSWORD="mypass" --network gaia -p 8787
docker run -itd --rm -e POSTGRES_PASSWORD=SuperSecret -e POSTGRES_USER=postgres --network gaia -p 5432:5432 --name gaia-db gaia-db
```

## Load "local" datasets
Local datasets can be used in gaia-db by loading them to the "offline storage directory", as specified in config.yml, in the gaia-core container. This directory can be specified in the inst/config.yml before building the image (/opt/data by default).

Datasets must share the `download_url` from the data_source_record and be stored in a subdirectory that shares the `dataset_name` from the data_source record:
```sh
# Create directory as specified in config.yml/offline_storage/directory
docker exec -it gaia-core bash -c "mkdir -p /opt/data/annual_measurement_2024"
# Copy file to directory specified in config.yml, with filename specified in data_source/download_url
docker cp /path/to/local/shpfile.zip gaia-core:/opt/data/annual_measurement_2024/shpfile.zip
```

## Using gaiaCore
The gaia-core container provides an R and RStudio environment with the R Package `gaiaCore` alongside the OHDSI HADES R Packages. `gaiaCore` provides the functionality for loading cataloged geospatial datasets into gaia-db and generate "exposures" by linking geospatial data to patient addresses.

Expand Down
4 changes: 2 additions & 2 deletions docker/gaia-db/init.sql
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ CREATE TABLE data_source (
download_subtype varchar(100) NOT NULL,
download_data_standard varchar(100) NOT NULL,
download_filename varchar(100) NOT NULL,
download_url varchar(100) NOT NULL,
download_url varchar(255) NOT NULL,
download_auth varchar(100) NULL,
documentation_url varchar(120) NULL );
documentation_url varchar(255) NULL );


CREATE TABLE variable_source (
Expand Down
3 changes: 3 additions & 0 deletions inst/config/storage.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
offline_storage:
directory: /opt/data
persist_data: false
4 changes: 2 additions & 2 deletions inst/ddl/001/gaiadb_001_ddl.sql
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ CREATE TABLE backbone.data_source (
download_subtype varchar(100) NOT NULL,
download_data_standard varchar(100) NOT NULL,
download_filename varchar(100) NOT NULL,
download_url varchar(120) NOT NULL,
download_url varchar(255) NOT NULL,
download_auth varchar(100) NULL,
documentation_url varchar(100) NULL );
documentation_url varchar(255) NULL );
CREATE TABLE backbone.variable_source (
variable_source_id serial4 NOT NULL,
geom_dependency_uuid int4 NULL,
Expand Down
15 changes: 14 additions & 1 deletion rmd/ht-local-dataset.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,17 @@ output:

# **Using a Local Dataset**

TODO
Two things must be done by the Gaia maintainer to enable offline storage:
1. A config.yml file in the inst directory must be created or edited to point to the offline storage directory where dataset files can be found
```yml
offline_storage:
directory: /opt/data
persist_data: false
```
2. The data_source record in the Gaia database must be authored to specify a local source and point to the local file
- `download_method` field should be set to 'local'
- `download_url` field should be set to the name of the file

Datasets must share the `download_url` from the data_source_record and be stored in a subdirectory that shares the `dataset_name` from the data_source record, e.g. `/opt/data/annual_measurement_2024/shpfile.zip`

With these two steps completed, and dataset properly named and stored, the Gaia API will be able to serve the dataset from the local file as it does any other data_source.

0 comments on commit 3fb5058

Please sign in to comment.