← Back home

On my desk

Papers I'm reading, books I'm going through, things I'm learning, and where I'm headed next.

What I'm reading

Foundational
Attention Is All You Need
Vaswani et al., 2017 — The paper that introduced the transformer architecture, foundational to everything in modern NLP and LLMs.
Adversarial ML
Universal and Transferable Adversarial Attacks on Aligned Language Models
Zou et al., 2023 — Demonstrates automated methods to generate adversarial suffixes that jailbreak aligned LLMs.
AI Safety
Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training
Hubinger et al., 2024 — Shows that backdoor behaviors can persist through standard safety fine-tuning methods.

Longer reads

Textbook
Deep Learning
Ian Goodfellow, Yoshua Bengio, Aaron Courville — The comprehensive reference for deep learning fundamentals.
Currently Reading
Influence: The Psychology of Persuasion
Robert Cialdini — The science behind why people say yes, and how to apply these principles ethically.

Going through

Course
Stanford CS231n: CNNs for Visual Recognition
Working through the lecture series and assignments on convolutional neural networks and computer vision.
Hands-on
PyTorch Adversarial Robustness Toolbox
Experimenting with adversarial attack and defense implementations for research projects.

Loose threads

Mechanistic interpretability — what's actually happening inside models at the circuit level
Writing more, thinking out loud about AI safety in long form
Red-teaming toolkits and open-source auditing frameworks
Coming Soon

Thoughts from the edge

A Substack where I write about AI safety, paper breakdowns, adversarial ML, and lessons from the lab.

Subscribe when it's live