WHY MULTIMODAL INTERFACES?
There are several strong reasons for creating interfaces that allow combined voice and gestural input. The first is purely practical; ease of expression. As Martin points out [2], typical computer interaction modalities are characterized by an ease versus expressiveness trade-off. Ease corresponds to the efficiency with which commands can be remembered and expressiveness the size of the command vocabulary. Common interaction devices range from the mouse that maximizes ease, to the keyboard that maximizes expressiveness. Multimodal input overcomes this trade-off; speech and gestural commands are easy to execute whilst retaining a large command vocabulary. Voice and gesture compliment each other and when used together, creating an interface more powerful that either modality alone. Cohen [3] summarizes this complimentary relationship and shows how natural language interaction is suited for descriptive techniques, while gestural interaction is ideal for direct manipulation of objects. The