We can visualize the efffects of a disentangled latent representation. We do this by interpolating one latent dimension while keeping the others fixed,
and visualize the resulting latent codes by passing them through a decoder.
ALDA learns to factorize background or "distractor" variables from task relevant variables automatically.
From the latent traversals on the "color hard" environment, we see that latent dimensions that interpolate aspects of the agent (legs, torso, feet) do not affect color information
of the agent (or sky and floor) and vice versa.
An interesting property of ALDA is that the latent variable trajectories through time oscillate with similar patterns as the some of the agent's proprioceptive state variables.
We visualize a few of the latent variable trajectories and compare them with some of the agent's proprioceptive state trajectories for a single rollout.
Given that the ALDA learns a disentangled latent representation, it highly likely that some of the latent variables correspond to certain proprioceptive state variables, such as
joint angles through time.
While the mapping from high dimensional image observation to the latent space is arbitrary and need not 1:1 correspond with proprioceptive state variables, this remains an
interesting observation that merits further investigation.
We compare against a set of baselines that together cover the range of approaches to zero-shot generalization in vision-based RL, including learning task-centric representations (RePo), disentangled representation learning without association (DARLA), and data augmentation (SVEA). We train on four tasks from the DMControl suite and test on two distribution shift environments, "color hard" that randomizes the colors of the scene and "DistractingCS" which introduces camera perturbations and plays a video in the background. ALDA performs better than all baselines on all tasks except SVEA, which uses additional data during data augmentation that likely puts the training distribution inside the support of the test distributions induced by the evaluation environments.
Original Task
Color Hard
Distracting CS
Original Task
Color Hard
Distracting CS
Original Task
Color Hard
Distracting CS