Surely software emulation of decimal arithmetic is fast enough?
No, it is not. The performance of existing software libraries for decimal arithmetic is very poor compared to hardware speeds. In some applications, the cost of decimal calculations can exceed even the costs of input and output and can form as much as 90% of the workload. See “The ‘telco’ benchmark” for an example and measurements on several implementations. Binary floating-point emulation in software was unacceptable for many applications until hardware implementations became available; the same is true for decimal floating-point (or even fixed-point) emulation today. Even using the decimal integer instructions on an IBM System z machine only improves fixed-point performance by about a factor of 10; rounding and scaling in software adds significant overhead. Complaints about the performance of decimal arithmetic are extremely common. Software emulation is 100 to 1000 times slower than a hardware implementation could be. For example, a JIT-compiled 9-digit BigDecimal division in JavaTM