使用 R 获取 KNN 分类器的决策边界答案

【问题标题】：Getting the decision boundary for KNN classifier using R使用 R 获取 KNN 分类器的决策边界
【发布时间】：2019-07-31 18:16:04
【问题描述】：

我正在尝试使用 R 中 ISLR 包中的 Auto 数据集来拟合 KNN 模型并获得决策边界。

在这里，我很难确定 3 类问题的决策边界。到目前为止，这是我的代码。我没有得到决策边界。

我在这个网站的其他地方看到了使用 ggplot 的这类问题的答案。但我想使用绘图功能以经典方式得到答案。

 library("ISLR")

trainxx=Auto[,c(1,3)]
trainyy=(Auto[,8])

n.grid1 <- 50

x1.grid1 <- seq(f = min(trainxx[, 1]), t = max(trainxx[, 1]), l = n.grid1)
x2.grid1 <- seq(f = min(trainxx[, 2]), t = max(trainxx[, 2]), l = n.grid1)
grid <- expand.grid(x1.grid1, x2.grid1)

library("class")
mod.opt <- knn(trainxx, grid, trainyy, k = 10, prob = T)

prob_knn <- attr(mod.opt, "prob")

我的问题主要是在这段代码之后。我很确定我必须修改以下部分。但我不知道怎么做。我需要在这里使用“嵌套如果”吗？

prob_knn <- ifelse(mod.opt == "3", prob_knn, 1 - prob_knn) 



prob_knn <- matrix(prob_knn, n.grid1, n.grid1)


plot(trainxx, col = ifelse(trainyy == "3", "green",ifelse(trainyy=="2", "red","blue")))
title(main = "plot of training data with Desicion boundary K=80")
contour(x1.grid1, x2.grid1, prob_knn, levels = 0.5, labels = "", xlab = "", ylab = "", 
        main = "", add = T , pch=20)

如果有人可以提出解决此问题的建议，那将是一个很大的帮助。

基本上我需要这样的东西来解决 3 类问题 https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of-a-k-nearest-neighbor-classifier-from-elements-o

【问题讨论】：

这可能是个骗子，请看这里：stackoverflow.com/questions/31234621/…
您几乎可以使用相同的代码，如果您收到错误，请使用 as.factor() 作为标签变量在 ggplot 内。

标签： r machine-learning classification knn

【解决方案1】：

我认为与其尝试将决策边界绘制为一条线，不如仅使用网格中每个点的预测类并将其绘制为填充区域可能更容易：

# Use the predicted class at each point
classes = matrix(as.numeric(mod.opt), n.grid1, n.grid1)

class_colors = c("#4E79A7", "#F28E2B", "#E15759")
# Add some transparency to make the fill colours less bright
fill_colors = paste0(class_colors, "88")
# Use image to plot the predicted class at each point
image(x1.grid1, x2.grid1, classes, col = fill_colors, 
      main = "plot of training data with decision boundary",
      xlab = colnames(trainxx)[1], ylab = colnames(trainxx)[2])
points(trainxx, col = class_colors[trainyy], pch = 16)

请注意，我已将您代码中的 n.grid1 增加到 200，以获得更详细的区域边界。

输出：

【讨论】：

感谢您发布答案。但这不适用于我的问题。我需要这样的东西来解决 3 类问题。stats.stackexchange.com/questions/21572/…
如果你真的需要为决策边界画一条线（我认为在 3 类问题中，你实际上有 2 个决策边界，分别为 1 vs 2 和 1 vs 3），你可能需要一个KNN 实现，可为您提供每个类别在每个点的预测概率，而不仅仅是获胜类别的概率。
嗨，这就是我想要的。唯一的问题是我不知道该怎么做。

【解决方案2】：

这是一种经过调整的方法，将决策边界绘制为线条。我认为这需要每个类别的预测概率，但在阅读this answer 之后，您可以将每个类别的预测概率标记为 1，无论预测哪个类别，否则为零。

# Create matrices for each class where p = 1 for any point
#   where that class was predicted, 0 otherwise
n_classes = 3
class_regions = lapply(1:n_classes, function(class_num) {
    indicator = ifelse(mod.opt == class_num, 1, 0)
    mat = matrix(indicator, n.grid1, n.grid1)
})

# Set up colours
class_colors = c("#4E79A7", "#F28E2B", "#E15759")
# Add some transparency to make the fill colours less bright
fill_colors = paste0(class_colors, "60")

# Use image to plot the predicted class at each point
classes = matrix(as.numeric(mod.opt), n.grid1, n.grid1)
image(x1.grid1, x2.grid1, classes, col = fill_colors, 
      main = "plot of training data with decision boundary",
      xlab = colnames(trainxx)[1], ylab = colnames(trainxx)[2])
# Draw contours separately for each class
lapply(1:n_classes, function(class_num) {
    contour(x1.grid1, x2.grid1, class_regions[[class_num]], 
            col = class_colors[class_num],
            nlevels = TRUE, add = TRUE, lwd = 2, drawlabels = FALSE)
})
# Using pch = 21 for bordered points that stand out a bit better
points(trainxx, bg = class_colors[trainyy], 
       col = "black",
       pch = 21)

结果图：

【讨论】：