This repository contains an input file, a python script, and a bash script which, when executed, reads the project IDs found in the input file, queries the NIH RePORTER API using these IDs, and outputs 4 files:
heal_awards.csv
- a table of the HEAL awards and associated information from the NIH RePORTER APIheal_awards_pubs.csv
- a table of publications associated with HEAL awardsprojects_not_in_reporter.txt
- a list of project numbers that do not return information from the NIH RePORTER APIprojects_with_missing_nums.txt
- a list of project titles that do not contain project numbers and thus are not queryable with the NIH RePORTER API
Code has been updated to optionally pull all project_num
s that are associated with appl_id
s. Note: appl_id
s uniquely identify records, whereas project_num
s do not - For a center grant such as the MAARC grant, the project_num
(1U2CDA050098-01) is identical to the Survey, Data, methods, and Administrative Core grants
This repository has been updated to allow for input IDs to be either appl_id
s or project_num
s. It is highly recommended to use appl_id
s where possible due to the ambiguity with naming conventions of project_num
s.
This repository has been updated to now abstract out the id_type
, input file
(and corresponding project_id
and project_title
column names), output path
, output prefix
for the files. There is also the option to get rid of non-UTF-8 chars with the --replace-non-utf
flag.
A user changes script parameters and executes the script in the query_nih_reporter.sh
bash script.
This script can take a list of appl_id
s or project_num
s. The list of appl_id
s was generated by NIH.
A list of HEAL project (along with project_num
s) can be downloaded from the funding awarded website.
Information about the API we use in this project can be found at the NIH RePORTER API website.
- When possible, it is better to use
appl_id
s instead ofproject_num
s. - The
project_num
input data we work with here was last updated in January 2022. - Some of the project numbers from the input data CSV are blank, meaning we can't get information from the NIH RePORTER API.
- Some project numbers are valid, but do not appear to be in NIH RePORTER.
- There does not seem to be a hook within the NIH Reporter wherein we could look up HEAL studies, thus we rely on the aforementioned input list.
It is assumed that those running this script are using Debian-flavored Linux and have bash installed.
- Python 3.6+
- pip
- git
- venv
- bash
- Clone the repository
git clone https://github.com/jcheadle-rti/heal_segmentation.git
- Create and activate the virtual environment
python3 -m venv venv
;source venv/bin/activate
- Update pip and install required packages
pip install --upgrade pip
pip install -r requirements.txt
- Review the bash script (
query_nih_reporter.sh
) to confirm the parameters are accurate - At the command prompt, run
bash query_nih_reporter.sh
heal_awards.csv
- a table of the HEAL awards and associated information from the NIH RePORTER APIheal_awards_pubs.csv
- a table of publications associated with HEAL awardsprojects_not_in_reporter.txt
- a list of project numbers that do not return information from the NIH RePORTER APIprojects_with_missing_nums.txt
- a list of project titles that do not contain project numbers and thus are not queryable with the NIH RePORTER API