What audio format is used?
The speech output audio format is a simple WAV (a.k.a. RIFF) file. The sample rate is 16KHz 16-bit linear, i.e. 16,000 samples per second, each sample a 16-bit integer, mono (not stereo). The website uses these wideband voices for best quality. We also ship 8KHz versions of the voice (8,000 per second, one 8-bit Mulaw value per sample) for a 4-times reduction in voice database size. The 8K voices are useful for telephony applications (where the phone line limits quality anyway) and for platforms with storage limitations. There is no option on the page for MP3 or similar encodings. The server would likely be overloaded if we added audio compression. If you need a different sample rate or audio format you can probably find free software to convert what we deliver. But before you use the audio for something more than private listening please check the website usage policy.