What is the goal of the CONTRA project?
The aim of the CONTRA project is to better understand discriminatively-trained probabilistic models of sequences (known as conditional random fields or CRFs) and their application to a variety of problems in computational biology. More specifically, most current applications of probabilistic sequence models in computational biology use generative probabilistic models, such as hidden Markov models (HMMs) and stochastic/probabilistic context free grammars (SCFGs/PCFGs). Generative models treat sequences as the result of simulating stochastic process: in HMMs, the stochastic process involves transitioning from one state of a finite state automaton to the next; in SCFGs/PCFGs, the stochastic process involves randomly picking the next production rule to apply to the current partial parse tree. While generative models are intuitive and allow convenient parameter training via maximum joint likelihood techniques, they also make many strong assumptions regarding the stochastic nature of the dat