Pathology Image Analysis Pipeline

This repository contains a pipeline for analyzing pathology images using various AI models (GPT-4, Claude-3, Claude-3.5) and evaluating their performance with different types of image labels/watermarks.

Pipeline Overview

The pipeline consists of four main steps:

Image Processing (image-processing-notebook_watermark.ipynb)
Model Inference (inference_llms_watermark.ipynb, inference_llms_promptengineering.ipynb)
Score Combination (combine_score_results.ipynb, combine_score_results_watermark.ipynb)
Visualization and Analysis (PathoPrompt Results.Rmd)

Prerequisites

Python Environment

Python 3.11+

Required packages:

pandas
pillow
cairosvg
numpy
tqdm
anthropic
openai
python-dotenv

R Environment

R 4.0+

Required packages:

ggplot2
readxl
dplyr
tidyr
gridExtra
FSA
rstatix
scales
RColorBrewer
openxlsx
svglite
networkD3
htmlwidgets
webshot2
rsvg

Step-by-Step Usage

1. Image Processing

Run image-processing-notebook_watermark.ipynb to:

Process original pathology images
Add labels/watermarks based on metadata
Generate modified images for analysis

Key parameters:

label_width_pct: Width of label as percentage of image
label_height_pct: Height of label as percentage of image
alpha: Transparency of label (0-100)

2. Model Inference

Run the inference notebooks in this order:

a. inference_llms_watermark.ipynb:

Processes images with standard prompts
Supports multiple AI models (GPT-4, Claude-3, Claude-3.5)
Handles both labeled and unlabeled images

b. inference_llms_promptengineering.ipynb:

Similar to watermark inference but with engineered prompts
Tests model behavior with different prompt strategies

Required environment variables:

API_KEY_CLAUDE: Anthropic API key
OPENAI_API_KEY: OpenAI API key

3. Score Combination

Run combine_score_results_watermark.ipynb to:

Combine results from different models
Calculate accuracy scores
Process diagnostic outputs
Generate combined analysis files

4. Visualization and Analysis

Run PathoPrompt Results.Rmd to:

Generate visualization plots
Perform statistical analysis
Create summary tables
Export results to Excel

Directory Structure

project/
│
├── notebooks/
│   ├── image-processing-notebook_watermark.ipynb
│   ├── inference_llms_watermark.ipynb
│   ├── inference_llms_promptengineering.ipynb
│   └── combine_score_results_watermark.ipynb
│
├── analysis/
│   └── PathoPrompt Results.Rmd
│
├── data/
│   ├── input/
│   │   └── Patient_Metadata_long.xlsx
│   └── output/
│       └── combined_analysis_results.xlsx
│
└── figures/
    └── [generated plots and visualizations]

Output Files

combined_analysis_results.xlsx: Contains combined model predictions and accuracy metrics
combined_analysis_results_watermark.xlsx: Results specific to watermark experiments
Generated figures in SVG format in the figures directory

Notes

Images must be under 4MB for API processing
Batch processing is used to handle rate limits
Set limit_items=True for testing with smaller datasets
Chrome and Adobe Reader should be closed when generating Sankey diagrams

Troubleshooting

If API calls fail, check rate limits and API keys
For image processing errors, verify file paths and image sizes
For R visualization issues, ensure all required packages are installed

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
PathoPrompt Results.Rmd		PathoPrompt Results.Rmd
README.md		README.md
combine_score_results.ipynb		combine_score_results.ipynb
combine_score_results_watermark.ipynb		combine_score_results_watermark.ipynb
image-processing-notebook_watermark.ipynb		image-processing-notebook_watermark.ipynb
inference_llms_promptengineering.ipynb		inference_llms_promptengineering.ipynb
inference_llms_watermark.ipynb		inference_llms_watermark.ipynb
table_wide_to_long.py		table_wide_to_long.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pathology Image Analysis Pipeline

Pipeline Overview

Prerequisites

Python Environment

R Environment

Step-by-Step Usage

1. Image Processing

2. Model Inference

3. Score Combination

4. Visualization and Analysis

Directory Structure

Output Files

Notes

Troubleshooting

About

Releases

Packages

Contributors 2

Languages

License

KatherLab/patho_prompt_injection

Folders and files

Latest commit

History

Repository files navigation

Pathology Image Analysis Pipeline

Pipeline Overview

Prerequisites

Python Environment

R Environment

Step-by-Step Usage

1. Image Processing

2. Model Inference

3. Score Combination

4. Visualization and Analysis

Directory Structure

Output Files

Notes

Troubleshooting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages