AI Safety Researcher

How can you trust a model you can't fully see inside?

In the search of an answer to this question, testing AI systems against real threats, "Backdoor detection in LLMs, adversarial attacks on vision-language models, and algorithmic auditing."

↓

About

A bit about me

I'm Aditya Bansal, a CS undergrad at BITS Pilani working on AI control & model evaluations and cross-architecture threat modeling. Currently building towards predoc and research fellowship roles before a PhD.

AI Alignment Mechanistic Interpretability LLM Safety

Projects

Work so far

View all →

Dark Pattern Detection in Ride-Hailing Platforms

Auditing framework to detect and classify dark patterns in ride-hailing apps.

Poisoning LLMs via Small Datasets

Replication study on data poisoning attacks under constrained hardware.

White-Box Auditing of ML Models

Internal auditing methods for inspecting model weights and decision boundaries.

Quantum Communication Protocols

Exploring quantum key distribution and secure communication channels.

Adversarial Attacks on Vision-Language Agents

Studying robustness of VLAs against targeted adversarial perturbations.

Journey

What I've been up to

View all →

Jan 2026 – Present

Undergraduate Student Researcher

BITS Pilani

Adversarial ML and AI safety research.

Jun – Aug 2025

Research Intern

IIT BHU (Varanasi)

Dark pattern detection in ride-hailing applications.

Aug 2024 – Dec 2024

Founding Engineer

Stealth Startup

Early-stage product engineering.

Currently

On my desk

See everything →

Paper

Attention Is All You Need

Vaswani et al. — The transformer architecture paper

Paper

Sleeper Agents

Hubinger et al. — Deceptive LLMs that persist through safety training

Book

Deep Learning

Ian Goodfellow — Foundations of modern deep learning

Coming Soon

Thoughts from the edge

I'm starting a Substack where I write about AI safety research, paper breakdowns, technical commentary on my projects and random life lessons.

Subscribe when it's live

Get in Touch

Let's talk

Always open to discussing research, collaboration, or new ideas.

aditya8.workspace@gmail.com

GitHub LinkedIn Twitter ORCID