Skip to content

Latest commit

 

History

History

analysis

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Directory Overview

Welcome to the analysis/ directory! This folder contains various analysis implementations for LLM360 models. Each subfolder is an independent and self-contained module with setup instructions, relying soley on the code within the subfolder.

  1. Data memorization (memorization/) evaluates model memorization of the training data.
  2. LLM Unlearning (unlearn/) implements machine unlearning methods to remove an LLM's hazardous knowledge.
  3. Safety360 (safety360/) contains modules to measure model safety:
    • bold/ provides sentiment analysis with BOLD dataset.
    • toxic_detection/ measures model's capability to identify toxic text.
    • toxigen/ evaluate model's toxicity on text generation.
    • wmdp/ evaluate model's hazardous knowledge.
  4. Mechanistic Interpretability (mechinterp/) contains packages visualizing algorithms executed by LLMs during inference.
  5. Evaluation metrics (metrics/) contains modules for model evaluation: