AI Safety Researcher

How can you trust a model you can't fully see inside?

In the search of an answer to this question, testing AI systems against real threats, "Backdoor detection in LLMs, adversarial attacks on vision-language models, and algorithmic auditing."

Aditya Bansal

A bit about me

I'm Aditya Bansal, a CS undergrad at BITS Pilani working on AI control & model evaluations and cross-architecture threat modeling. Currently building towards predoc and research fellowship roles before a PhD.

AI Alignment Mechanistic Interpretability LLM Safety

Work so far

View all →

What I've been up to

View all →
Jan 2026 – Present
Undergraduate Student Researcher
BITS Pilani

Adversarial ML and AI safety research.

Jun – Aug 2025
Research Intern
IIT BHU (Varanasi)

Dark pattern detection in ride-hailing applications.

Aug 2024 – Dec 2024
Founding Engineer
Stealth Startup

Early-stage product engineering.

On my desk

See everything →
Paper
Attention Is All You Need
Vaswani et al. — The transformer architecture paper
Paper
Sleeper Agents
Hubinger et al. — Deceptive LLMs that persist through safety training
Book
Deep Learning
Ian Goodfellow — Foundations of modern deep learning
Coming Soon

Thoughts from the edge

I'm starting a Substack where I write about AI safety research, paper breakdowns, technical commentary on my projects and random life lessons.

Subscribe when it's live

Let's talk

Always open to discussing research, collaboration, or new ideas.