What is the best algorithm to use?
The true answer, of course, is that we don’t know, and that it probably hasn’t been invented yet. Each algorithm has strengths and weaknesses, and the current favorite changes every few years. In the 1980s actor-critic methods were very popular, but in the 1990s they were largely superceded by value-function methods such as Q-learning and Sarsa. Q-learning is probably still the most widely used, but its instability with function approximation, discovered in 1995, probably rules it out for the long run. Recently policy-based methods such as actor-critic and value-function-less methods, including some of those from the 1980s, have become popular again. So, it seems we must keep our minds and options open as RL moves forward.