WIKI

In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

  • In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.

  • In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.

K近邻法-k-nearest neighbor,KNN

算法(algorithm)

K近邻法-k-nearest neighbor,KNN

模型(model)

  • 模型基本要素:距离度量、K值的选择、分类决策规则

距离度量

Lp距离:当p=2时,称为欧式距离

K近邻法-k-nearest neighbor,KNN

K值的选择

  • K值若较小:近似误差减小,估计误差增大,整体的模型变得复杂,预测结果对近邻的实例点非常敏感,容易过拟合。

  • K值若较大:近似误差增大,估计误差减小,整体的模型变得简单。

  • 应用中:k值一般取较小的值,采用交叉验证法选取最优的k值。

分类决策规则

多数表决规则

实现

  • 主要考虑如何对训练数据进行快速k近邻搜索

  • 最简单的实现:线性扫描

  • 提高效率:kd树

PS:kd树是一种对k维空间中实例点进行存储以便对其进行快速检索的树形数据结构

构造kd树

K近邻法-k-nearest neighbor,KNN

K近邻法-k-nearest neighbor,KNN

K近邻法-k-nearest neighbor,KNN

搜索kd树

K近邻法-k-nearest neighbor,KNN

 

K近邻法-k-nearest neighbor,KNN

K近邻法-k-nearest neighbor,KNN

 

 

 

 

 

 

 

 

 

 

 

 

 

相关文章: