How Is Data Mining Done?
CRISP-DM is a widely accepted methodology for data mining projects. For details, see htttp://www.crisp-dm.org. The steps in the process are: • Business Understanding: Understand the project objectives and requirements from a business perspective, and then convert this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives. • Data Understanding: Start by collecting data, then get familiar with the data, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hypotheses about hidden information. • Data Preparation: Includes all activities required to construct the final data set (data that will be fed into the modeling tool) from the initial raw data. Tasks include table, case, and attribute selection as well as transformation and cleaning of data for modeling tools. • Modeling: Select and apply a variety of modelling techniques, and calibrate tool parameters to optimal values