Why isn HttpUnit handling my non-English pages?
When no Content-Type header is specified, HTML 1.1 says that the character set is to be taken as iso-8859-1. Unfortunately, some HTTP servers do not send this parameter correctly, and many browsers provide a workaround to permit the user to determine the character set in some other fashion. To imitate this behavior, HttpUnit allows you to set the expected character set for future pages by calling HttpUnitOptions.setDefaultCharacterSet(). This setting will not apply to those pages for which the server specifies the character set.