Name		Name	Last commit message	Last commit date
parent directory ..
mechinterp		mechinterp
memorization		memorization
metrics		metrics
safety360		safety360
unlearning		unlearning
README.md		README.md

README.md

Directory Overview

Welcome to the analysis/ directory! This folder contains various analysis implementations for LLM360 models. Each subfolder is an independent and self-contained module with setup instructions, relying soley on the code within the subfolder.

Data memorization (memorization/) evaluates model memorization of the training data.
LLM Unlearning (unlearn/) implements machine unlearning methods to remove an LLM's hazardous knowledge.
Safety360 (safety360/) contains modules to measure model safety:
- bold/ provides sentiment analysis with BOLD dataset.
- toxic_detection/ measures model's capability to identify toxic text.
- toxigen/ evaluate model's toxicity on text generation.
- wmdp/ evaluate model's hazardous knowledge.
Mechanistic Interpretability (mechinterp/) contains packages visualizing algorithms executed by LLMs during inference.
Evaluation metrics (metrics/) contains modules for model evaluation:
- harness/ provides instructions to evaluate models following the Open LLM Leaderboard.
- ppl/ evaluates model per-token perplexity

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis

analysis

README.md

Directory Overview

Files

analysis

Directory actions

More options

Directory actions

More options

Latest commit

History

analysis

Folders and files

parent directory

README.md

Directory Overview