Learning to model the world could in principle enable agents to generalize in environments with many different tasks. However, learning latent dynamics models suitable for planning has been a long-standing challenge. We present the deterministic belief state model DBSM, a probabilistic dynamics model for latent planning in high-dimensional environments. The model propagates deterministic beliefs as activation vectors forward in time, providing context during long-term predictions. We further introduce variational overshooting, a generalization of the variational free energy bound for sequence models that encourages consistency between closed-loop and open-loop predictions. Experiments on pixel-based locomotion tasks show that our model recovers the information of the true simulator state from purely unsupervised experience without rewards. Leveraging the latent space, we learn reward functions from a few example episodes and obtain locomotion gaits using planning without the need for a separate policy. Leveraging online data acquisition, our model reaches reasonable scores in 100 to 1000 times fewer episodes than model-free algorithms.