rusheb

Follow

Rusheb Shah rusheb

Follow

26 followers · 31 following

Achievements

Achievements

rusheb/README.md

Hi there, I'm Rusheb!

I am currently working on LLM Evaluations at Apollo Research.

Past OSS contributions:

I contributed to the mechanistic interpretability library TransformerLens. Most notably, I added support for BERT to the library.
I worked on MazeDataset, a library for generation, filtering, solving, visualizing, and processing of mazes for training ML systems.

Research:

I co-authored a neurips workshop, paper Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation, where we used language models to automate generation of narrative-based jailbreaks on GPT-4 and other SOTA models.

Pinned Loading

TransformerLensOrg/TransformerLens TransformerLensOrg/TransformerLens Public

A library for mechanistic interpretability of GPT-style language models

Python 1.8k 318
arena-hackathon-attribution-patching arena-hackathon-attribution-patching Public

A novel automated circuit discovery algorithm based on attribution patching. First-prize winner of ARENA Interpretability Hackathon.

Python 3 1
understanding-search/maze-transformer understanding-search/maze-transformer Public

This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.

Jupyter Notebook 25 6
chat chat Public

A basic async terminal chatroom app that I built to help me learn asynchronous programming with asyncio.

Python
coursera-machine-learning coursera-machine-learning Public

My solutions to the exercises from Andrew Ng's Machine Learning Course (Coursera).

MATLAB
cs50 cs50 Public

My problem set solutions for CS50 2018.

C