Why are the Microsoft Office file formats so complicated?
Last week, Microsoft published the binary file formats for Office. These formats appear to be almost completely insane. The Excel 97-2003 file format is a 349 page PDF file. But wait, that’s not all there is to it! This document includes the following interesting comment: Each Excel workbook is stored in a compound file. You see, Excel 97-2003 files are OLE compound documents, which are, essentially, file systems inside a single file. These are sufficiently complicated that you have to read another 9 page spec to figure that out. And these “specs” look more like C data structures than what we traditionally think of as a spec. It’s a whole hierarchical file system. If you started reading these documents with the hope of spending a weekend writing some spiffy code that imports Word documents into your blog system, or creates Excel-formatted spreadsheets with your personal finance data, the complexity and length of the spec probably cured you of that desire pretty darn quickly.