Data Mining 1999


Examination 16.5.2000


You may answer either in english or in finnish.


Describe the outline of a 'typical' data mining project. Explain in special what kinds of
problems you are likely to encounter and how those problems can be dealt with.


Attributes are usually categorized to nominal and numeric. Some machine learning methods,
however, are capable of handling only nominal data. How can numeric data be transformed
(intelligently) to nominal form? Compare different techniques.

Explain in sufficient detail, how algorithm C4.5 works.

Why are validation techniques important in data mining? How do they try to make the most out
of limited test data? What kind of other problems may appear in the validation process?
Give an implementation of one (good) validation method.