Skip to content

oohtmeel1/AI-Data-Readiness-Challenge-for-the-NCI-Cancer-Research-Data-Commons

Repository files navigation

Project to Assess AI data readiness of the NLST

I performed a project for the CRDC using the 7 bridges platform and this repository contains my work.

Quantitative_assessment_data - A quantitative assessment of the NLST data set in Jupyter. Final_submission_preprocessing_and_cleanup_only - A notebook to illustrate how data was loaded, cleaned and organized. Ml_model - A notebook showing basic model architecture and some resulting metrics. submission_documentation.pdf - The report discussing findings and recommendations.

I performed this project for the NIH, as part of the AI data readiness challenge. The purpose of this report was to assess the AI readiness of the publicly available data from The National Lung Screening Trial (NLST) dataset. Specifically, by using the data available to train a machine learning model to identify Cancerous lung nodules without the presence of annotated slides for reference. Data was obtained, then transformed into a format easily used by a machine learning model. Training and testing were performed, and metrics were quantified. Qualitative inferences were also made about the data, then quantitative inferences were attempted.

Background: The National Lung screening trial (NLST) was a large study involving more than 50,000 patients over several years. Patients were screened three times annually and were assigned to receive either low dose-helical CT scans or standard chest X-rays. This was done to assess whether those exposed to low dose CT scans had better outcomes than those exposed to X-rays. [1] The study indicated that patients who received low dose helical CT scans had a lower risk of dying from lung cancer than those who received X-rays only. [2]

Results and Conclusion: This project was a great learning experience and I am glad I attempted it. I got to work in a cloud network environment and see the interesting projects taking place there. I worked with complex data that was new to me. Faced issues I had never seen before and learned to overcome them. I did not have that much experience working with Pytorch and this was a great reason to do so.

References:

The model can be found here:

https://computational.cancer.gov/model/aidr-challenge-tier-1-mcfarlin

BIBLIOGRAPHY

NCI Imaging Data commons landing page: https://datacommons.cancer.gov/repository/imaging-data-commons

NLST Landing page - the Cancer Data Access System. (n.d.). https://cdas.cancer.gov/nlst/ https://cdas.cancer.gov/nlst/ [1]

National Lung Screening Trial (NLST). (2014, September 8). National Cancer Institute. https://www.cancer.gov/types/lung/research/nlst https://www.cancer.gov/types/lung/research/nlst [2]

About

Project for the CDC using the 7 bridges platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published