What is a MONK text?
The following is a set of reflections on data-herding, data-curating, and data-preprocessing in MONK. These are all issues that we want to get away from or push on to someone else. And once a solid instance of MONK exists somewhere, it will be possible to think of this bundle of issues (or some of them) as tasks that are the responsibility of users who want to expose their collections to the functionalities created by MONK. But the first time round the users is us. We must build a collection of sufficient size and complexity to solve some of the scale problems that are inherent in the functionalities we want to deliver. And this initial collection may be seen alternately as an instance and a model. If we do a good job of pre-processing and otherwise ‘herding’ the source texts, we will generate complex and sufficiently accurate data sets to populate the various object models required for a variety of use cases, some of which have been articulated in Nora or WordHoard and some of which r