GitHub - oxford-pharmacoepi/etl_ndorms

How to run the ETL for OMOP CDM 5.3 for Windows in Python (v. 3.2 onwards) using Postgresql

First time setup:

If you don't already have it, download the latest version of python 3.x from https://www.python.org/downloads/ and add the path to the directory containing the python executables to your environment variables.
Open a new command prompt as ADMINSTRATOR and navigate to the root of the aurum_etl module.
Download the python-postgres client with pip install psycopg2.
Download the python sqlparse with pip install sqlparse.
If you don't already have it, request an EPI KEY to use with the ATHENA vocabularies from: https://uts.nlm.nih.gov/uts/umls/home

Download ATHENA Vocabularies:

Request vocabularies from: https://athena.ohdsi.org/vocabulary/list
Download the zipped file in an appropriate directory, unzip and follow the ATHENA instructions
Run "java -Dumls-apikey=YOUR_EPI_KEY -jar cpt4.jar 5"

Run the ETL:

In postgres, create a database
Move the file __postgres_db_conf.py to the <full project_directory>, open it with a plain editor and customise to suit your particular ETL
Open a new command prompt where you have deployed the python code
if db_type == 'gold': Run py 0_manage_gold_files.py -F<full project_directory>

A full project_directory (e.g. D:\dir1\dir2\dir3\dir4\hesapc) has at least the following subfolders: data, lookups, source_to_concept_map, vocabulary
Run py 1_load_source_data.py -F<full project_directory>
Run py 2_load_lookup.py -F<full project_directory>
Run py 3_load_cdm_vocabulary.py -F<full project_directory>
if db_type == 'gold': Run CREATE FILE FOR DAYSUPPLY
if db_type == 'gold': Run C# (Teen)
else Run py 4_map_source_in_chunk.py -F<full project_directory>
Run py 5_build_cdm_pk_idx.py -F<full project_directory>
Run py 6_build_cdm_era_tbl.py -F<full project_directory>
Run py 7_count_cdm_records.py -F<full project_directory>
If needed, Run py 8_load_source_denominator.py -F<full project_directory>

Check data quality:

Name		Name	Last commit message	Last commit date
Latest commit History 732 Commits
docs		docs
sql_scripts		sql_scripts
stcm		stcm
.gitignore		.gitignore
0_manage_gold_files.py		0_manage_gold_files.py
10_merge_db_linked.py		10_merge_db_linked.py
1_load_source_data.py		1_load_source_data.py
2_load_lookup.py		2_load_lookup.py
3_load_cdm_vocabulary.py		3_load_cdm_vocabulary.py
4_load_mapped_data.py		4_load_mapped_data.py
4_map_ons.py		4_map_ons.py
4_map_source_in_chunk.py		4_map_source_in_chunk.py
5_build_cdm_pk_idx_fk.py		5_build_cdm_pk_idx_fk.py
6_build_cdm_era_tbl.py		6_build_cdm_era_tbl.py
7_count_cdm_records.py		7_count_cdm_records.py
8_load_source_denominator.py		8_load_source_denominator.py
9_load_achilles_dqd.py		9_load_achilles_dqd.py
README.md		README.md
__postgres_db_conf.py		__postgres_db_conf.py
mapping_util.py		mapping_util.py

Provide feedback