Semester
2- 6
Jan, 2003 - 17 May, 2003
|
fgdg
|
H6677
-- Information
Mining & Analysis |
Lecturer |
Associate
Professor: Khoo Soo Guan, Christopher
|
Course
Description |
This
course covers the main data mining techniques used to analyse
numerical and textual data in order to discover hidden patterns
and develop prediction models. Techniques covered include statistical
data analysis, clustering, nearest neighbour categorisation, decision-tree
induction and neural networks. Industry applications of data mining
techniques are examined. Students will have hands-on experience
with statistical analysis and data mining software, and with the
process of data mining and knowledge-discovery. An introduction
to data warehousing and On-Line Analytical Processing (OLAP) will
also be provided.
The
approach taken this semester is a semi-technical approach, rather
than a management-approach or a mathematical approach. The focus
is on understanding the concepts and principles underlying the
data mining techniques, and on hands-on practical experience.
The course will also view data mining techniques as general-purpose
data analysis techniques.
|
Course
Objectives |
At
the end of the course, students are expected to:
-
Understand the principles and concepts underlying the main data
mining techniques, and their strengths and limitations;
-
Apply data mining techniques and the knowledge discovery process
to discover hidden information in numerical and textual data;
-
Understand the different kinds of patterns and models that can
be extracted from a data set, and be able to select and use
an appropriate technique for each type of pattern and model;
-
Be able to interpret and evaluate the results of data mining;
-
Describe how data mining can be used in real-life applications;
-
Know the main features and functionalities that a good data
mining tool should have.
|
Course
Schedule |
- Introduction,
Basic statistics, Point estimation and hypothesis testing
- Review
of basic statistics, Analyzing the relation between two variables
- Multiple
regression analysis
- Logistic
regression, Clustering
-
K-Nearest Neighbour
- Data
preparation/preprocessing
- Decision
tree induction
- Neural
networks
- Association
analysis, Recall, precision, gain and lift
- Time
series
- On-line
Analytical Processing (OLAP)
- Text
data mining
- Review
|
Reading
List |
- [Han]
Han, J., & Kamber, M. (2001). Data mining: Concepts and
techniques. San Francisco: Morgan Kaufmann.
- [SPSS]
Howitt, D., & Cramer, D. (2001). A guide to computing statistics
with SPSS. Harlow, England: Prentice-Hall.
- [Berson]
Berson, A., & Smith, S.J. (1997). Data warehousing, data
mining, and OLAP. New York: McGraw-Hill.
- [Weiss]
Weiss, S.M., & Indurkhya, N. (1998). Predictive data mining:
A practical guide. San Francisco: Morgan Kaufmann.
- [Berry]
Berry, M.J.A., & Linoff, G. (2000). Mastering data mining:
The art and science of customer relationship management. New
York: Wiley.
- [Patterson]
Patterson, D.W. (1996). Artificial neural networks: Theory and
applications. Singapore: Prentice Hall , c1996.
- [Tabachnick]
Tabachnick, B.G., & Fidell, L.S. (2001). Using multivariate
statistics (4th ed.). Boston: Allyn and Bacon.
|
|
|