I am a PhD candidate at the University of Southern California advised by Gaurav Sukhatme in the Robotics Embedded Systems Laboratory (RESL). I’m interested in creating intelligent systems following neuroscientific perspectives on intelligence. That is, agents that learn predictive models of the world that learn from and actively seek out surprising or novel experiences. In the past I’ve worked on Reinforcement Learning (RL), improving exploration in RL via Quality Diversity / Novelty Search methods, and generative models for robot planning and control. I’m currently working on biologically plausible learning algorithms to create more robust and truly generalizable models deployable on real robots.
Over the past two summers, I’ve had the immense privilege of working with the Autonomous Vehicles team at NVIDIA as a research scientist intern on Reinforcement Learning and Diffusion generative models for automatic scenario generation. Before that, I was an intern at the National Institute of Standards and Technology (NIST), where I worked on Generative Adversarial Networks for generating realistic 4G LTE signals.
PhD Candidate, in progress
University of Southern California
BSc in Computer Science, Minor in Applied Mathematics, 2020
University of Colorado Boulder
Training generally capable agents that thoroughly explore their environment and learn new and diverse skills is a long-term goal of robot learning. Quality Diversity Reinforcement Learning (QD-RL) is an emerging research area that blends the best aspects of both fields – Quality Diversity (QD) provides a principled form of exploration and produces collections of behaviorally diverse agents, while Reinforcement Learning (RL) provides a powerful performance improvement operator enabling generalization across tasks and dynamic environments. Existing QD-RL approaches have been constrained to sample efficient, deterministic off-policy RL algorithms and/or evolution strategies, and struggle with highly stochastic environments. In this work, we, for the first time, adapt on-policy RL, specifically Proximal Policy Optimization (PPO), to the Differentiable Quality Diversity (DQD) framework and propose additional improvements over prior work that enable efficient optimization and discovery of novel skills on challenging locomotion tasks. Our new algorithm, Proximal Policy Gradient Arborescence (PPGA), achieves state-of-the-art results, including a 4x improvement in best reward over baselines on the challenging humanoid domain.
Recent progress in Quality Diversity Reinforcement Learning (QD-RL) has enabled learning a collection of behaviorally diverse, high performing policies. However, these methods typically involve storing thousands of policies, which results in high space-complexity and poor scaling to additional behaviors. Condensing the archive into a single model while retaining the performance and coverage of the original collection of policies has proved challenging. In this work, we propose using diffusion models to distill the archive into a single generative model over policy parameters. We show that our method achieves a compression ratio of 13x while recovering 98% of the original rewards and 89% of the original coverage. Further, the conditioning mechanism of diffusion models allows for flexibly selecting and sequencing behaviors, including using language.
We demonstrate the possibility of learning drone swarm controllers that are zero-shot transferable to real quadrotors via large-scale multi-agent end-to-end reinforcement learning. We train policies parameterized by neural networks that are capable of controlling individual drones in a swarm in a fully decentralized manner. Our policies, trained in simulated environments with realistic quadrotor physics, demonstrate advanced flocking behaviors, perform aggressive maneuvers in tight formations while avoiding collisions with each other, break and re-establish formations to avoid collisions with moving obstacles, and efficiently coordinate in pursuit-evasion tasks. We analyze, in simulation, how different model architectures and parameters of the training regime influence the final performance of neural swarms. We demonstrate the successful deployment of the model learned in simulation to highly resource-constrained physical quadrotors performing station keeping and goal swapping behaviors.