Why use arithmetic encoding?
Predictive arithmetic encoding is within 1 bit of optimal, assuming the probabilities are accurate. Shannon proved in 1949 that the best we can do to compress a string x is log2 1/P(x) bits. In our example, we have P(the) = P(t)P(h|t)P(e|th) = 0.1 x 0.2 x 0.3 = 0.006, and log2 1/0.006 = 7.38 bits. We can always find an 8 bit code within any range of width 0.006 because these numbers occur every 1/28, or about every 0.004.