数据仓库与知识发现（二）初识

这篇其实也属于这门课程的初识篇。

所有发现的pattern都有趣吗？

一个数据挖掘系统可能会产生成千上万个patterns，但并不是所有的都是interesting的。所以提供了建议的方法：

1 easily unterstood by humans 可以很容易被人理解

2 vaild on new or test data with some degree of certainty 对新数据或测试数据有一定的准确性

3 potentially useful, novel, or vaildates some hypothesis 可能有用的，新颖的，或验证用户试图缺人的某些假设

PS：objective vs. subjective interestingness measure:

Objective: based on statistics and strcutures of patterns, e.g.,support, confidence, etc

Subjective: based on user's belief in the data, e.g., unexpectednedd, novelty, actionability, etc

换句话说，能够使人类关注的数据，符合统计学，需要通过客观评价筛选人为排除相关数据等都属于有趣的数据。

Can we find all and only interesting patterns?

1 find all the interesting patterns: completeness

2 search for only interesting patterns: optimization

数据挖掘的多学科融合：

数据仓库与知识发现（二）初识

Classification schemes 分类方案

1 descriptive data mining 描述性对某个数据集合进行描述特征

2 predictive data mining 预测性关联、分类、聚类…

3 kinds of databases, knowledge, techniques, applications…

OLAP 联机分析处理：On-Line Analytical Processing

OLTP 联机事务处理：On-Line Transaction Processing

OLAP Mining 工具：本质是人为分析

OLAM 工具

数据挖掘major issues

1 mining methodology and user interaction 挖掘方法和用户交互

2 performance and scalability 性能和可伸缩性

3 issues relating to the diversity of data types 与数据类型多样性相关的问题

4 issues related to applications and social impacts 与应用和社会影响相关的问题