Examination 16.5.2000
You may answer either in english or in finnish.
Describe the outline of a 'typical' data mining project. Explain
in special what kinds of
problems you are likely to encounter and how those problems can
be dealt with.
Attributes are usually categorized to nominal and numeric. Some
machine learning methods,
however, are capable of handling only nominal data. How can
numeric data be transformed
(intelligently) to nominal form? Compare different techniques.
Explain in sufficient detail, how algorithm C4.5 works.
Why are validation techniques important in data mining? How do
they try to make the most out
of limited test data? What kind of other problems may appear in
the validation process?
Give an implementation of one (good) validation method.