On my desk

Papers I'm reading, books I'm going through, things I'm learning, and where I'm headed next.

Research Papers

What I'm reading

Foundational

Attention Is All You Need

Vaswani et al., 2017 — The paper that introduced the transformer architecture, foundational to everything in modern NLP and LLMs.

Adversarial ML

Universal and Transferable Adversarial Attacks on Aligned Language Models

Zou et al., 2023 — Demonstrates automated methods to generate adversarial suffixes that jailbreak aligned LLMs.

AI Safety

Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training

Hubinger et al., 2024 — Shows that backdoor behaviors can persist through standard safety fine-tuning methods.

Books

Textbook

Deep Learning

Ian Goodfellow, Yoshua Bengio, Aaron Courville — The comprehensive reference for deep learning fundamentals.

Currently Reading

Influence: The Psychology of Persuasion

Robert Cialdini — The science behind why people say yes, and how to apply these principles ethically.

Tutorials & Learning

Course

Stanford CS231n: CNNs for Visual Recognition

Working through the lecture series and assignments on convolutional neural networks and computer vision.

Hands-on

PyTorch Adversarial Robustness Toolbox

Experimenting with adversarial attack and defense implementations for research projects.

Lately

— Mechanistic interpretability — what's actually happening inside models at the circuit level

— Writing more, thinking out loud about AI safety in long form

— Red-teaming toolkits and open-source auditing frameworks

Coming Soon

A Substack where I write about AI safety, paper breakdowns, adversarial ML, and lessons from the lab.

Subscribe when it's live