I am a PhD candidate at the University of Southern California advised by Gaurav Sukhatme in the Robotics Embedded Systems Laboratory (RESL). My research interests are in biologically inspired machine learning algorithms towards creating generalist agents and more philosophically, in algorithms that help us understand the nature of biological intelligence. In the past I’ve worked on Reinforcement Learning (RL), improving exploration in RL via Quality Diversity / Novelty Search methods, and generative models for robot planning and control. I’m currently working on biologically plausible learning algorithms such as theories of predictive coding as generative world models for future prediction and active inference.
Over the past two summers, I’ve had the immense privilege of working with the Autonomous Vehicles team at NVIDIA as a research scientist intern on Diffusion generative models for automatic scenario generation, as well as implementing simulation pipelines for Reinforcement Learning. Before that, I was an intern at the National Institute of Standards and Technology (NIST), where I worked on Generative Adversarial Networks for generating realistic 4G LTE signals.
PhD in AI and Robotics, in progress
University of Southern California
BSc in Computer Science, Minor in Applied Mathematics, 2020
University of Colorado Boulder
Responsibilities include:
external_link
.Training generally capable agents that thoroughly explore their environment and learn new and diverse skills is a long-term goal of robot learning. Quality Diversity Reinforcement Learning (QD-RL) is an emerging research area that blends the best aspects of both fields – Quality Diversity (QD) provides a principled form of exploration and produces collections of behaviorally diverse agents, while Reinforcement Learning (RL) provides a powerful performance improvement operator enabling generalization across tasks and dynamic environments. Existing QD-RL approaches have been constrained to sample efficient, deterministic off-policy RL algorithms and/or evolution strategies, and struggle with highly stochastic environments. In this work, we, for the first time, adapt on-policy RL, specifically Proximal Policy Optimization (PPO), to the Differentiable Quality Diversity (DQD) framework and propose additional improvements over prior work that enable efficient optimization and discovery of novel skills on challenging locomotion tasks. Our new algorithm, Proximal Policy Gradient Arborescence (PPGA), achieves state-of-the-art results, including a 4x improvement in best reward over baselines on the challenging humanoid domain.
Recent progress in Quality Diversity Reinforcement Learning (QD-RL) has enabled learning a collection of behaviorally diverse, high performing policies. However, these methods typically involve storing thousands of policies, which results in high space-complexity and poor scaling to additional behaviors. Condensing the archive into a single model while retaining the performance and coverage of the original collection of policies has proved challenging. In this work, we propose using diffusion models to distill the archive into a single generative model over policy parameters. We show that our method achieves a compression ratio of 13x while recovering 98% of the original rewards and 89% of the original coverage. Further, the conditioning mechanism of diffusion models allows for flexibly selecting and sequencing behaviors, including using language.
We demonstrate the possibility of learning drone swarm controllers that are zero-shot transferable to real quadrotors via large-scale multi-agent end-to-end reinforcement learning. We train policies parameterized by neural networks that are capable of controlling individual drones in a swarm in a fully decentralized manner. Our policies, trained in simulated environments with realistic quadrotor physics, demonstrate advanced flocking behaviors, perform aggressive maneuvers in tight formations while avoiding collisions with each other, break and re-establish formations to avoid collisions with moving obstacles, and efficiently coordinate in pursuit-evasion tasks. We analyze, in simulation, how different model architectures and parameters of the training regime influence the final performance of neural swarms. We demonstrate the successful deployment of the model learned in simulation to highly resource-constrained physical quadrotors performing station keeping and goal swapping behaviors.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam mi diam, venenatis ut magna et, vehicula efficitur enim.