Are all the results of Dorothea correctly reported on the web page?
Dorothea is a strongly biased dataset: it has only about 10% positive examples. Classifiers that minimize the error rate, not the balanced error rate (BER) will tend to predict systematically the negative class. This yields an error rate of about 10%, but a BER of about 50%. However, the AUC may be very good if the classifier orders the scores in a meaningful way.