Vision-Language-Action Models
Training end-to-end policies that map camera pixels and a natural language instruction directly to robot actions, and studying how to scale them without scaling the demonstration budget.
Embodied AI · Large Language Models · Robot Learning
I'm a graduate student in the Robotics Department at the University of Texas at Austin, where I'm part of the Grounded Embodied Learning (GEL) Lab. I work on giving language models a body — connecting the reasoning of LLMs to robots that perceive, plan, and act in the messy physical world.
Originally from San Angelo, Texas 🤠 — raised on big West Texas skies, the Concho River, and good Tex-Mex, now somewhere between a robotics building and too many tabs of arXiv.
I grew up in San Angelo, Texas, where my first "robot" was a Roomba I kept reprogramming to avoid the cat. That curiosity turned into a degree in Computer Science, and eventually into research on how machines can understand and act in the real world. Today I'm a graduate student at the University of Texas at Austin, in the Grounded Embodied Learning (GEL) Lab, where I split my time between training large vision-language-action models and debugging why the robot arm keeps knocking over my coffee.
My broad interest is embodied intelligence: I believe the next leap for AI won't come from text alone, but from agents that are grounded in perception, action, and consequence. I care about making models that are not just capable, but reliable and sample-efficient enough to learn from a handful of demonstrations rather than millions.
Outside of research, you'll find me hunting for the best hand-pull noodles in town, playing pickup basketball, or taking the long way home so I can listen to one more podcast episode.
I want robots that can be told what to do in plain language and figure out the rest. My work sits at the intersection of three threads:
Training end-to-end policies that map camera pixels and a natural language instruction directly to robot actions, and studying how to scale them without scaling the demonstration budget.
Using large language models as high-level planners that decompose long-horizon tasks into grounded subgoals — and keeping them honest with closed-loop feedback from the environment.
Imitation and reinforcement learning methods that generalize from a handful of human demonstrations, with a focus on robustness to distribution shift in the real world.
I'm early in grad school, so most of this is in-progress rather than published — research questions I'm actively chasing. I'll add papers here as they come together.
Ongoing project · manuscript in preparation
Ongoing project
Course project · exploratory
I'll keep this page updated as projects mature — nothing here is peer-reviewed yet.
A minimal, hackable PyTorch implementation of a vision-language-action policy — built for teaching and quick experiments. ~1.8k stars.
Lightweight simulation benchmarks for tabletop manipulation that run on a single laptop GPU. Used in two grad courses at U-M.
A weekend project: a tiny LLM agent that texts me the day's West Texas weather and Concho River level before I head out for a run.
Research on embodied AI and LLM-driven robot policies.
Scaled vision-language-action pretraining on a fleet of mobile manipulators.
Graduated with honors; undergrad thesis on sim-to-real transfer.
Where the Roomba experiments began.
I'm always happy to chat about embodied AI, grad school, or where to find good dumplings in Austin. The best way to reach me is GitHub: