这篇其实也属于这门课程的初识篇。
所有发现的pattern都有趣吗?
一个数据挖掘系统可能会产生成千上万个patterns,但并不是所有的都是interesting的。所以提供了建议的方法:
1 easily unterstood by humans 可以很容易被人理解
2 vaild on new or test data with some degree of certainty 对新数据或测试数据有一定的准确性
3 potentially useful, novel, or vaildates some hypothesis 可能有用的,新颖的,或验证用户试图缺人的某些假设
PS:objective vs. subjective interestingness measure:
Objective: based on statistics and strcutures of patterns, e.g.,support, confidence, etc
Subjective: based on user's belief in the data, e.g., unexpectednedd, novelty, actionability, etc
换句话说,能够使人类关注的数据,符合统计学,需要通过客观评价筛选人为排除相关数据等都属于有趣的数据。
Can we find all and only interesting patterns?
1 find all the interesting patterns: completeness
2 search for only interesting patterns: optimization
数据挖掘的多学科融合:
Classification schemes 分类方案
1 descriptive data mining 描述性 对某个数据集合进行描述特征
2 predictive data mining 预测性 关联、分类、聚类…
3 kinds of databases, knowledge, techniques, applications…
OLAP 联机分析处理:On-Line Analytical Processing
OLTP 联机事务处理:On-Line Transaction Processing
OLAP Mining 工具:本质是人为分析
OLAM 工具
数据挖掘major issues
1 mining methodology and user interaction 挖掘方法和用户交互
2 performance and scalability 性能和可伸缩性
3 issues relating to the diversity of data types 与数据类型多样性相关的问题
4 issues related to applications and social impacts 与应用和社会影响相关的问题