Email me :)
W a n g L i n a : M S c ( I n f o r m a t i o n S t u d i e s ) : F u l l - T i m e 2 0 0 2 / 2 0 0 3

 

Semester 2- 6 Jan, 2003 - 17 May, 2003

fgdg
H6677 -- Information Mining & Analysis
Lecturer

Associate Professor: Khoo Soo Guan, Christopher

Course Description

This course covers the main data mining techniques used to analyse numerical and textual data in order to discover hidden patterns and develop prediction models. Techniques covered include statistical data analysis, clustering, nearest neighbour categorisation, decision-tree induction and neural networks. Industry applications of data mining techniques are examined. Students will have hands-on experience with statistical analysis and data mining software, and with the process of data mining and knowledge-discovery. An introduction to data warehousing and On-Line Analytical Processing (OLAP) will also be provided.

The approach taken this semester is a semi-technical approach, rather than a management-approach or a mathematical approach. The focus is on understanding the concepts and principles underlying the data mining techniques, and on hands-on practical experience. The course will also view data mining techniques as general-purpose data analysis techniques.

Course Objectives

At the end of the course, students are expected to:

  • Understand the principles and concepts underlying the main data mining techniques, and their strengths and limitations;
  • Apply data mining techniques and the knowledge discovery process to discover hidden information in numerical and textual data;
  • Understand the different kinds of patterns and models that can be extracted from a data set, and be able to select and use an appropriate technique for each type of pattern and model;
  • Be able to interpret and evaluate the results of data mining;
  • Describe how data mining can be used in real-life applications;
  • Know the main features and functionalities that a good data mining tool should have.
Course Schedule
  • Introduction, Basic statistics, Point estimation and hypothesis testing
  • Review of basic statistics, Analyzing the relation between two variables
  • Multiple regression analysis
  • Logistic regression, Clustering
  • K-Nearest Neighbour
  • Data preparation/preprocessing
  • Decision tree induction
  • Neural networks
  • Association analysis, Recall, precision, gain and lift
  • Time series
  • On-line Analytical Processing (OLAP)
  • Text data mining
  • Review
Reading List
  • [Han] Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. San Francisco: Morgan Kaufmann.
  • [SPSS] Howitt, D., & Cramer, D. (2001). A guide to computing statistics with SPSS. Harlow, England: Prentice-Hall.
  • [Berson] Berson, A., & Smith, S.J. (1997). Data warehousing, data mining, and OLAP. New York: McGraw-Hill.
  • [Weiss] Weiss, S.M., & Indurkhya, N. (1998). Predictive data mining: A practical guide. San Francisco: Morgan Kaufmann.
  • [Berry] Berry, M.J.A., & Linoff, G. (2000). Mastering data mining: The art and science of customer relationship management. New York: Wiley.
  • [Patterson] Patterson, D.W. (1996). Artificial neural networks: Theory and applications. Singapore: Prentice Hall , c1996.
  • [Tabachnick] Tabachnick, B.G., & Fidell, L.S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn and Bacon.