Welcome to the analysis/
directory! This folder contains various analysis implementations for LLM360 models. Each subfolder is an independent and self-contained module with setup instructions, relying soley on the code within the subfolder.
- Data memorization (
memorization/
) evaluates model memorization of the training data. - LLM Unlearning (
unlearn/
) implements machine unlearning methods to remove an LLM's hazardous knowledge. - Safety360 (
safety360/
) contains modules to measure model safety:bold/
provides sentiment analysis with BOLD dataset.toxic_detection/
measures model's capability to identify toxic text.toxigen/
evaluate model's toxicity on text generation.wmdp/
evaluate model's hazardous knowledge.
- Mechanistic Interpretability (
mechinterp/
) contains packages visualizing algorithms executed by LLMs during inference. - Evaluation metrics (
metrics/
) contains modules for model evaluation:harness/
provides instructions to evaluate models following the Open LLM Leaderboard.ppl/
evaluates model per-token perplexity