Action and Perception as Divergence Minimization
NeurIPS Deep RL Workshop 2020
We introduce a unified objective for action and perception of intelligent agents. Extending representation learning and control, we minimize the joint divergence between the world and a target distribution. Intuitively, such agents use perception to align their beliefs with the world, and use actions to align the world with their beliefs. Minimizing the joint divergence to an expressive target maximizes the mutual information between the agent’s representations and inputs, thus inferring representations that are informative of past inputs and exploring future inputs that are informative of the representations. This lets us derive intrinsic objectives, such as representation learning, information gain, empowerment, and skill discovery from minimal assumptions. Moreover, interpreting the target distribution as a latent variable model suggests expressive world models as a path toward highly adaptive agents that seek large niches in their environments, while rendering task rewards optional. The presented framework provides a common language for comparing a wide range of objectives, facilitates understanding of latent variables for decision making, and offers a recipe for designing novel objectives. We recommend deriving future agent objectives from the joint divergence to facilitate comparison, to point out the agent’s target distribution, and to identify the intrinsic objective terms needed to reach that distribution.