Embodied AI · Large Language Models · Robot Learning

Caleb Wilson

I'm a graduate student in the Robotics Department at the University of Texas at Austin, where I'm part of the Grounded Embodied Learning (GEL) Lab. I work on giving language models a body — connecting the reasoning of LLMs to robots that perceive, plan, and act in the messy physical world.

Originally from San Angelo, Texas 🤠 — raised on big West Texas skies, the Concho River, and good Tex-Mex, now somewhere between a robotics building and too many tabs of arXiv.

See my research CV GitHub

📍 Austin, TX
🏙️ Home: San Angelo, TX
🎓 M.S. → Ph.D., Robotics
✦ Embodied AI & LLMs
☕ Currently: a lot of coffee

About

I grew up in San Angelo, Texas, where my first "robot" was a Roomba I kept reprogramming to avoid the cat. That curiosity turned into a degree in Computer Science, and eventually into research on how machines can understand and act in the real world. Today I'm a graduate student at the University of Texas at Austin, in the Grounded Embodied Learning (GEL) Lab, where I split my time between training large vision-language-action models and debugging why the robot arm keeps knocking over my coffee.

My broad interest is embodied intelligence: I believe the next leap for AI won't come from text alone, but from agents that are grounded in perception, action, and consequence. I care about making models that are not just capable, but reliable and sample-efficient enough to learn from a handful of demonstrations rather than millions.

Outside of research, you'll find me hunting for the best hand-pull noodles in town, playing pickup basketball, or taking the long way home so I can listen to one more podcast episode.

Research

I want robots that can be told what to do in plain language and figure out the rest. My work sits at the intersection of three threads:

◆

Vision-Language-Action Models

Training end-to-end policies that map camera pixels and a natural language instruction directly to robot actions, and studying how to scale them without scaling the demonstration budget.

🧠

LLMs as Planners

Using large language models as high-level planners that decompose long-horizon tasks into grounded subgoals — and keeping them honest with closed-loop feedback from the environment.

🔁

Learning from Few Demonstrations

Imitation and reinforcement learning methods that generalize from a handful of human demonstrations, with a focus on robustness to distribution shift in the real world.

What I'm Working On

I'm early in grad school, so most of this is in-progress rather than published — research questions I'm actively chasing. I'll add papers here as they come together.

Closed-loop LLM planning for long-horizon manipulation

Ongoing project · manuscript in preparation

Studying how to keep an LLM planner grounded when a multi-step task drifts off course — using environment feedback to re-plan instead of failing silently.
Data-efficient vision-language-action policies

Ongoing project

Can a manipulation policy generalize from a handful of demonstrations? Exploring pretraining and augmentation tricks that stretch a small demo budget.
Calibrated uncertainty for embodied agents

Course project · exploratory

When should a robot say "I'm not sure" and ask for help? A side project on getting policies to know the limits of their own competence.

I'll keep this page updated as projects mature — nothing here is peer-reviewed yet.

Projects & Open Source

tinyVLA

A minimal, hackable PyTorch implementation of a vision-language-action policy — built for teaching and quick experiments. ~1.8k stars.

GitHub →

armbench-lite

Lightweight simulation benchmarks for tabletop manipulation that run on a single laptop GPU. Used in two grad courses at U-M.

GitHub →

concho-bot

A weekend project: a tiny LLM agent that texts me the day's West Texas weather and Concho River level before I head out for a run.

GitHub →

Timeline

2024 — present
Graduate Student, Robotics · University of Texas at Austin
Research on embodied AI and LLM-driven robot policies.
Summer 2025
Research Intern · Embodied AI startup
Scaled vision-language-action pretraining on a fleet of mobile manipulators.
2020 — 2024
B.S. in Computer Science · Stony Brook University
Graduated with honors; undergrad thesis on sim-to-real transfer.
Before that
Central High School · San Angelo, TX
Where the Roomba experiments began.

Get in touch

I'm always happy to chat about embodied AI, grad school, or where to find good dumplings in Austin. The best way to reach me is GitHub:

github.com/jaywme

Caleb Wilson

About

Research

Vision-Language-Action Models

LLMs as Planners

Learning from Few Demonstrations

What I'm Working On

Closed-loop LLM planning for long-horizon manipulation

Data-efficient vision-language-action policies

Calibrated uncertainty for embodied agents

Projects & Open Source

tinyVLA

armbench-lite

concho-bot

Timeline

Get in touch