What if the AI misinterprets its goals?
A5). It is true that language and symbol systems are open to infinite interpretations, and an AI which has been given its goals purely in the form of written text may understand them in a way that is different from the way its designers intended them, as in the various misinterpretations of Asimov’s Three Laws. The key insight here is that what we want to transfer to the AI is not the output of our thoughts about morality, but the thoughts themselves: the processes that we have that made us look at something like slavery and conclude that it was wrong, even though we didn’t think it was wrong beforehand.