From 64e584ed3a776eb6c706990c5c19a215f912720d Mon Sep 17 00:00:00 2001 From: Kyle Zollo-Venecek Date: Fri, 1 Mar 2024 13:28:06 -0500 Subject: [PATCH] Fix userGuidnce, etlConventions docs --- docs/gaia-datamodels.html | 126 ++++++++++++++++++--------------- docs/gaiaCore/pkgdown.yml | 2 +- inst/csv/gaia001fieldLevel.csv | 6 +- rmd/gaia-datamodels.Rmd | 9 +-- 4 files changed, 79 insertions(+), 64 deletions(-) diff --git a/docs/gaia-datamodels.html b/docs/gaia-datamodels.html index 786dde9..6956525 100644 --- a/docs/gaia-datamodels.html +++ b/docs/gaia-datamodels.html @@ -478,15 +478,9 @@

data_source

web-hosted entities. All source data in gaiaDB must be referenced in this table.

User Guide

-

All records in this table are sources of geospatial data. They can be -sources of geometry data, such as point, line, or polygon, or they can -be attribute data with an identifier that relates them to geometry data, -such as a FIPS code or GEOID.

+

NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

ETL Conventions

-

All sources of data that should be included in gaiaDB must have an -entry in this table. Geometry data sources require a “geom_spec”: a -lightweight transformation from the source data to the standardized -format, written in R and serialized as JSON.

+

NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

@@ -930,22 +924,9 @@

variable_source

data source enabling downstream data integrations. All variables from attribute source data must be catalogued in this table.

User Guide

-

All records in this table describe distinct variables from attribute -source data. For example, consider a weather dataset that is being added -to gaiaDB. First, the entire dataset is catalogued in the data_source -table. Then, the distinct variables of that dataset (temperature in -fahrenheit, temperature in celsius, inches of rain, wind direction, -etc.) each become a single record in this table. All records in this -table are related back to their parent source dataset via a foreign key -relationship to the data_source table. Many variable_source records can -be related to a single data_source record.

+

NA NA NA NA NA NA

ETL Conventions

-

Every individual variable from a source dataset must have an entry in -this table. Likewise, any source attribute dataset that gets included in -the data_source table will likely have many “children” in this table. -All records in this table contain an “attr_spec”: a lightweight -transformation of a single variable into the standardized table -format.

+

NA NA NA NA NA NA

@@ -1140,10 +1121,9 @@

attr_index

A programmatically derived index table of all the attribute source datasets included in the data_source table.

User Guide

-

This table can be (re)generated after new entries are added to the -data_source table by running the gaiaCore createIndices() function.

+

NA NA NA NA NA NA

ETL Conventions

-

Run the createIndices() function to (re)generate this table.

+

NA NA NA NA NA NA

@@ -1339,10 +1319,9 @@

geom_index

A programmatically derived index table of all the geometry source datasets included in the data_source table.

User Guide

-

This table can be (re)generated after new entries are added to the -data_source table by running the gaiaCore createIndices() function.

+

NA NA NA NA NA NA NA NA NA

ETL Conventions

-

Run the createIndices() function to (re)generate this table.

+

NA NA NA NA NA NA NA NA NA

@@ -1612,10 +1591,9 @@

attr_template

This table is a template for the standardized attribute table that get created.

User Guide

-

No action necessary. This table must simply exist (with no entries) -in the backbone schema.

+

NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

ETL Conventions

-

No action necessary.

+

NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

@@ -2065,10 +2043,9 @@

geom_template

This table is a template for the standardized geometry tables that get created.

User Guide

-

No action necessary. This table must simply exist (with no entries) -in the backbone schema.

+

NA NA NA NA NA NA NA

ETL Conventions

-

No action necessary.

+

NA NA NA NA NA NA NA

@@ -2289,12 +2266,9 @@

geom_omop_location

This table contains identifier and text address from OMOP Location table records along with their associated geocoded, point geometry.

User Guide

-

Populate this table from the OMOP Location table to facilitate -creation of CDM Extension tables.

+

NA NA NA

ETL Conventions

-

Use the geocodeAddresses() function as outlined in https://ohdsi.github.io/GIS/ht-geocode.html

+

NA NA NA

@@ -2412,11 +2386,9 @@

omop_location_history

Table Description

This table is a copy of the OMOP Location_History table.

User Guide

-

Copy the OMOP Location_History table to Gaia to facilitate creation -of CDM Extension tables

+

NA NA NA NA NA NA

ETL Conventions

-

This table should be an exact duplicate of the OMOP Location_History -table.

+

NA NA NA NA NA NA

@@ -2616,9 +2588,53 @@

exposure_occurrence

transformations of data from Gaia and to interface with ATLAS and OHDSI tool stack from an OMOP CDM database.

User Guide

-

NA

+

The unique key given to a social or environmental exposure for a +Person The LOCATION_ID of the Person for whom the exposure is +associated. The PERSON_ID of the Person for whom the exposure is +associated. The EXPOSURE_CONCEPT_ID field is recommended for primary use +in analyses, and must be used for network studies. This is the standard +concept mapped from the source value which represents a exposure. Use +this date to determine the start date of the exposure. NA Use this date +to determine the end date of the exposure. NA This field identifies the +origin of the exposure record (e.g. Census, EHR, Environmental data, +Geospatial data, Satellite imagery, GIS mapping, Sensor network, Mobile +device geolocation, LiDAR) This field can be used to determine the +spatiotemporal relationship between the source Exposure and the Person +This field can be used to determine the original source of place-based +exposure data This field houses the verbatim name of the original source +of place-based exposure data. NA NA NA NA The meaning of +Concept?4172703?for ?=? is identical to omission of a +OPERATOR_CONCEPT_ID value. Since the use of this field is rare, it?s +important when devising analyses to not to forget testing for the +content of this field for values different from =. This is the numerical +value of the Exposure, if available. If the raw data gives a categorial +result for exposures those values are captured and mapped to standard +concepts in the ?Exposure Value? domain. UNIT_SOURCE_VALUES should be +mapped to a Standard Concept in the Unit domain that best represents the +unit as given in the source data.

ETL Conventions

-

NA

+

Each derived instance of an exposure should be assigned this unique +key. NA NA The CONCEPT_ID to which the source exposure is mapped. This +mapping should be integrated into the variable_source record and +automatically populated in this record. The date range of the exposure +should represent the temporal overlap between the place-based exposure +data point and the LOCATION_ID’s location_history record. NA The date +range of the exposure should represent the temporal overlap between the +place-based exposure data point and the LOCATION_ID’s location_history +record. NA The CONCEPT_ID to which the exposure’s data source type is +mapped. This mapping should be integrated into the data_source record +and automatically populated in this record. The CONCEPT_ID to which the +relationship between the Exposure and the Person is mapped. This mapping +should be automatically populated in this record. The CONCEPT_ID to +which the exposure’s data source is mapped. This mapping should be +integrated into the data_source record and automatically populated in +this record. This name is mapped to a Standard Exposure Source Concept +and the original name is stored here for reference. NA NA NA NA NA This +value should be integrated into the variable_source record and +automatically populated in this record. This mapping should be +integrated into the variable_source record and automatically populated +in this record. This mapping should be integrated into the +variable_source record and automatically populated in this record.

@@ -2878,9 +2894,9 @@

exposure_occurrence

exposure_type_concept_id @@ -3162,8 +3177,7 @@

exposure_occurrence

-This field can be used to determine the provenance of the Exposure -record, as in whether the exposure was from an ___________ or other -sources. +This field identifies the origin of the exposure record (e.g. Census, +EHR, Environmental data, Geospatial data, Satellite imagery, GIS +mapping, Sensor network, Mobile device geolocation, LiDAR) The CONCEPT_ID to which the exposure’s data source type is mapped. This @@ -3102,11 +3118,10 @@

exposure_occurrence

operator_concept_id
-The meaning of Concept4172703for <91>=<92> is -identical to omission of a OPERATOR_CONCEPT_ID value. Since the use of -this field is rare, it<92>s important when devising analyses to -not to forget testing for the content of this field for values different -from =. +The meaning of Concept?4172703?for ?=? is identical to omission of a +OPERATOR_CONCEPT_ID value. Since the use of this field is rare, it?s +important when devising analyses to not to forget testing for the +content of this field for values different from =. If the raw data gives a categorial result for exposures those values are -captured and mapped to standard concepts in the <91>Exposure -Value<92> domain. +captured and mapped to standard concepts in the ?Exposure Value? domain. This mapping should be integrated into the variable_source record and diff --git a/docs/gaiaCore/pkgdown.yml b/docs/gaiaCore/pkgdown.yml index ded0628..1ee4fb5 100644 --- a/docs/gaiaCore/pkgdown.yml +++ b/docs/gaiaCore/pkgdown.yml @@ -2,5 +2,5 @@ pandoc: '2.18' pkgdown: 2.0.7 pkgdown_sha: ~ articles: {} -last_built: 2024-03-01T14:55Z +last_built: 2024-03-01T16:56Z diff --git a/inst/csv/gaia001fieldLevel.csv b/inst/csv/gaia001fieldLevel.csv index cc67ca0..c9cb329 100644 --- a/inst/csv/gaia001fieldLevel.csv +++ b/inst/csv/gaia001fieldLevel.csv @@ -76,7 +76,7 @@ exposure_occurrence,exposure_start_date,Yes,date,Use this date to determine the exposure_occurrence,exposure_start_datetime,No,datetime,,,No,No,,,,, exposure_occurrence,exposure_end_date,Yes,date,Use this date to determine the end date of the exposure.,The date range of the exposure should represent the temporal overlap between the place-based exposure data point and the LOCATION_ID's location_history record.,No,No,,,,, exposure_occurrence,exposure_end_datetime,No,datetime,,,No,No,,,,, -exposure_occurrence,exposure_type_concept_id,Yes,integer,"This field can be used to determine the provenance of the Exposure record, as in whether the exposure was from an ___________ or other sources.",The CONCEPT_ID to which the exposure's data source type is mapped. This mapping should be integrated into the data_source record and automatically populated in this record.,No,Yes,concept,concept_id,Type Concept,, +exposure_occurrence,exposure_type_concept_id,Yes,integer,"This field identifies the origin of the exposure record (e.g. Census, EHR, Environmental data, Geospatial data, Satellite imagery, GIS mapping, Sensor network, Mobile device geolocation, LiDAR)",The CONCEPT_ID to which the exposure's data source type is mapped. This mapping should be integrated into the data_source record and automatically populated in this record.,No,Yes,concept,concept_id,Type Concept,, exposure_occurrence,exposure_relationship_concept_id,Yes,integer,This field can be used to determine the spatiotemporal relationship between the source Exposure and the Person,The CONCEPT_ID to which the relationship between the Exposure and the Person is mapped. This mapping should be automatically populated in this record.,No,Yes,concept,concept_id,,, exposure_occurrence,exposure_source_concept_id,No,integer,This field can be used to determine the original source of place-based exposure data,The CONCEPT_ID to which the exposure's data source is mapped. This mapping should be integrated into the data_source record and automatically populated in this record.,No,Yes,concept,concept_id,,, exposure_occurrence,exposure_source_value,No,varchar(50),This field houses the verbatim name of the original source of place-based exposure data.,This name is mapped to a Standard Exposure Source Concept and the original name is stored here for reference.,No,No,,,,, @@ -84,7 +84,7 @@ exposure_occurrence,exposure_relationship_source_value,No,varchar(50),,,No,No,,, exposure_occurrence,dose_unit_source_value,No,varchar(50),,,No,No,,,,, exposure_occurrence,quantity,No,integer,,,No,No,,,,, exposure_occurrence,modifier_source_value,No,varchar(50),,,No,No,,,,, -exposure_occurrence,operator_concept_id,No,integer,"The meaning of Concept4172703for = is identical to omission of a OPERATOR_CONCEPT_ID value. Since the use of this field is rare, its important when devising analyses to not to forget testing for the content of this field for values different from =.",,No,Yes,concept,concept_id,,, +exposure_occurrence,operator_concept_id,No,integer,"The meaning of Concept?4172703?for ?=? is identical to omission of a OPERATOR_CONCEPT_ID value. Since the use of this field is rare, it?s important when devising analyses to not to forget testing for the content of this field for values different from =.",,No,Yes,concept,concept_id,,, exposure_occurrence,value_as_number,No,float,"This is the numerical value of the Exposure, if available.",This value should be integrated into the variable_source record and automatically populated in this record.,No,No,,,,, -exposure_occurrence,value_as_concept_id,No,integer,If the raw data gives a categorial result for exposures those values are captured and mapped to standard concepts in the Exposure Value domain.,This mapping should be integrated into the variable_source record and automatically populated in this record.,No,Yes,concept,concept_id,,, +exposure_occurrence,value_as_concept_id,No,integer,If the raw data gives a categorial result for exposures those values are captured and mapped to standard concepts in the ?Exposure Value? domain.,This mapping should be integrated into the variable_source record and automatically populated in this record.,No,Yes,concept,concept_id,,, exposure_occurrence,unit_concept_id,No,integer,UNIT_SOURCE_VALUES should be mapped to a Standard Concept in the Unit domain that best represents the unit as given in the source data.,This mapping should be integrated into the variable_source record and automatically populated in this record.,No,Yes,concept,concept_id,Unit,, diff --git a/rmd/gaia-datamodels.Rmd b/rmd/gaia-datamodels.Rmd index 8cb1332..d7f9612 100644 --- a/rmd/gaia-datamodels.Rmd +++ b/rmd/gaia-datamodels.Rmd @@ -65,12 +65,13 @@ for(tb in tables) { tableInfo <- subset(tableSpecs, gaiaTableName == tb) cat("**Table Description**\n\n",tableInfo[,"tableDescription"][[1]], "\n\n") - if(!isTRUE(tableInfo[,"userGuidance"][[1]]=="")){ - cat("**User Guide**\n\n",tableInfo[,"userGuidance"][[1]],"\n\n") + fieldInfo <- subset(cdmSpecs, gaiaTableName == tb) + if(!isTRUE(fieldInfo[,"userGuidance"][[1]]=="")){ + cat("**User Guide**\n\n",fieldInfo[,"userGuidance"][[1]],"\n\n") } - if(!isTRUE(tableInfo[,"etlConventions"][[1]]=="")){ - cat("**ETL Conventions**\n\n",tableInfo[,"etlConventions"][[1]],"\n\n") + if(!isTRUE(fieldInfo[,"etlConventions"][[1]]=="")){ + cat("**ETL Conventions**\n\n",fieldInfo[,"etlConventions"][[1]],"\n\n") } loopTable <- subset(gaiaSpecsClean, `Gaia Table` == tb)