Bayesian partially observable reinforcement learning