How to run the ETL for OMOP CDM 5.3 for Windows in Python (v. 3.2 onwards) using Postgresql
First time setup:
- If you don't already have it, download the latest version of python 3.x from https://www.python.org/downloads/ and add the path to the directory containing the python executables to your environment variables.
- Open a new command prompt as ADMINSTRATOR and navigate to the root of the aurum_etl module.
- Download the python-postgres client with
pip install psycopg2
. - Download the python sqlparse with
pip install sqlparse
. - If you don't already have it, request an EPI KEY to use with the ATHENA vocabularies from: https://uts.nlm.nih.gov/uts/umls/home
Download ATHENA Vocabularies:
- Request vocabularies from: https://athena.ohdsi.org/vocabulary/list
- Download the zipped file in an appropriate directory, unzip and follow the ATHENA instructions
- Run "java -Dumls-apikey=YOUR_EPI_KEY -jar cpt4.jar 5"
Run the ETL:
-
In postgres, create a database
-
Move the file
__postgres_db_conf.py
to the <full project_directory>, open it with a plain editor and customise to suit your particular ETL -
Open a new command prompt where you have deployed the python code
-
if db_type == 'gold': Run
py 0_manage_gold_files.py
-F<full project_directory>
A full project_directory (e.g. D:\dir1\dir2\dir3\dir4\hesapc) has at least the following subfolders: data, lookups, source_to_concept_map, vocabulary -
Run
py 1_load_source_data.py
-F<full project_directory> -
Run
py 2_load_lookup.py
-F<full project_directory> -
Run
py 3_load_cdm_vocabulary.py
-F<full project_directory> -
if db_type == 'gold': Run CREATE FILE FOR DAYSUPPLY
-
if db_type == 'gold': Run C# (Teen)
else Runpy 4_map_source_in_chunk.py
-F<full project_directory> -
Run
py 5_build_cdm_pk_idx.py
-F<full project_directory> -
Run
py 6_build_cdm_era_tbl.py
-F<full project_directory> -
Run
py 7_count_cdm_records.py
-F<full project_directory> -
If needed, Run
py 8_load_source_denominator.py
-F<full project_directory>
Check data quality:
- Run Achilles
- Run DQD