Is RL just trial-and-error learning, or does it include planning?
Modern reinforcement learning concerns both trial-and-error learning without a model of the environment, and deliberative planning with a model. By “a model” here we mean a model of the dynamics of the environment. In the simplest case, this means just an estimate of the state-transition probabilities and expected immediate rewards of the environment. In general it means any predictions about the environment’s future behavior conditional on the agent’s behavior.