在 R 中绘制一个区域答案

【问题标题】：Plot a Region in R在 R 中绘制一个区域
【发布时间】：2013-12-29 14:02:23
【问题描述】：

我在 [-1,1]^2 区间内生成了一个包含 100 个随机 x-y 坐标的矩阵：

n <- 100
datam <- matrix(c(rep(1,n), 2*runif(n)-1, 2*runif(n)-1), n) 
# leading 1 column needed for computation
# second column has x coordinates, third column has y coordinates

并通过给定的目标函数 f（向量）将它们分为 2 类 -1 和 1。我计算了一个假设函数 g，现在想可视化它与目标函数 f.

f <- c(1.0, 0.5320523, 0.6918301)   # the given target function
ylist <- sign(datam %*% f)    # classify into -1 and 1

# perceptron algorithm to find g:
perceptron = function(datam, ylist) {
  w <- c(1,0,0)             # starting vector
  made.mistake = TRUE 
  while (made.mistake) {
  made.mistake=FALSE 
  for (i in 1:n) {
  if (ylist[i] != sign(t(w) %*% datam[i,])) {
    w <- w + ylist[i]*datam[i,]
    made.mistake=TRUE 
  }
 }
}
return(w=w)
}

g <- perceptron(datam, ylist)

我现在想在情节中比较 f 和 g。

我可以在数学中很容易地做到这一点。此处显示的是具有目标函数 f 的数据集，该函数将 +1 和 -1 部分中的数据分开：

这个数学图显示了 f 和 g 的比较（不同的数据集和 f）

这是对应的mathematica代码

ContourPlot[g.{1, x1, x2} == 0, {x1, -1, 1}, {x2, -1, 1}]

如何在 R 中做类似的事情（ggplot 会很好）？

【问题讨论】：

是的，这是可能的。但是您的示例不可重现，因此我无法用代码回答您。
抱歉，我为一个工作示例添加了代码
Highlighting regions of interest in ggplot2 的可能重复项
@James 我不确定这是否是重复的。在我看来，OP 要求一种方法来从数据中获取区分边界，而不是如何在绘图上产生阴影。

标签： r plot ggplot2

【解决方案1】：

使用ggplot 也是如此。此示例完全遵循您的代码，然后在末尾添加：

# OP's code...
# ...

glist <- sign(datam %*% g)

library(reshape2)  # for melt(...)
library(plyr)      # for .(...)
library(ggplot2)
df <- data.frame(datam,f=ylist,g=glist) # df has columns: X1, X2, X3, f, g
gg <- melt(df,id.vars=c("X1","X2","X3"),variable.name="model")

ggp <- ggplot(gg, aes(x=X2, y=X3, color=factor(value)))
ggp <- ggp + geom_point()
ggp <- ggp + geom_abline(subset=.(model=="f"),intercept=-f[1]/f[3],slope=-f[2]/f[3])
ggp <- ggp + geom_abline(subset=.(model=="g"),intercept=-g[1]/g[3],slope=-g[2]/g[3])
ggp <- ggp + facet_wrap(~model)
ggp <- ggp + scale_color_discrete(name="Mistake")
ggp <- ggp + labs(title=paste0("Comparison of Target (f) and Hypothesis (g) [n=",n,"]"))
ggp <- ggp + theme(plot.title=element_text(face="bold"))
ggp

以下是n=200, 500, and 1000 的结果。当n=100, g=c(1,0,0)。你可以看到 f 和 g 收敛于n~500。

如果您是 ggplot 的新手：首先我们创建一个数据框 (df)，其中包含坐标 (X2 and X3) 和两列用于基于 f 和 g 的分类。然后我们使用melt(...) 将其转换为“长”格式的新数据帧gg。 gg 具有列 X1, X2, X3, model, and value。 gg$model 列标识模型 (f or g)。对应的分类在gg$value。然后 ggplot 调用执行以下操作：

建立默认数据集 gg、x 和 y 坐标以及颜色 [ggplot(...)]
添加点层[geom_point(...)]
添加分隔分类的行 [geom_abline(...)]
告诉 ggplot 在不同的“方面”绘制两个模型 [facet_wrap(...)]
设置图例名称。
设置剧情标题。
使情节标题加粗。

【讨论】：

【解决方案2】：

您的示例仍然无法重现。查看我的代码，您会发现 f 和 g 是相同的。此外，您似乎正在为您没有的数据点推断线（问题的第二部分）。你有任何证据表明歧视应该是线性的吗？

#Data generation
n <- 10000
datam <- matrix(c(rep(1,n), 2*runif(n)-1, 2*runif(n)-1), n) 
# leading 1 column needed for computation
# second column has x coordinates, third column has y coordinates
datam.df<-data.frame(datam)
datam.df$X1<-NULL
f <- c(1.0, 0.5320523, 0.6918301)   # the given target function
f.col <- ifelse(sign(datam %*% f)==1,"darkred", "darkblue")    
f.fun<-sign(datam %*% f)

# perceptron algorithm to find g:
perceptron = function(datam, ylist) {
  w <- c(1,0,0)             # starting vector
  made.mistake = TRUE 
  while (made.mistake) {
  made.mistake=FALSE 
  for (i in 1:n) {
  if (ylist[i] != sign(t(w) %*% datam[i,])) {
    w <- w + ylist[i]*datam[i,]
    made.mistake=TRUE 
  }
 }
}
return(w=w)
}


g <- perceptron(datam, f.fun)
g.fun<-sign(datam %*% g)

绘制整体数据

plot(datam.df$X2, datam.df$X3, col=f.col, pch=".", cex=2)

我将为 g 和 f 函数生成单独的图，因为在您的示例中某些内容不起作用并且 f 和 g 是相同的。解决此问题后，您可以将所有内容放在一个情节中。您还可以查看并选择是否要阴影。如果您没有证据表明分类是线性的，那么使用chull() 标记您拥有的数据 可能更明智。

对于 f 函数

plot(datam.df$X2, datam.df$X3, col=f.col, pch=".", xlim=c(-1,-0.5), ylim=c(-1,-.5), cex=3, main="f function")
datam.df.f<-datam.df[f.fun==1,]
ch.f<-chull(datam.df.f$X2, datam.df.f$X3 )
ch.f <- rbind(x = datam.df.f[ch.f, ], datam.df.f[ch.f[1], ])
polygon(ch.f, lwd=3, col=rgb(0,0,180,alpha=50, maxColorValue=255))

对于 g 函数

    g.col <- ifelse(sign(datam %*% g)==1,"darkred", "darkblue")    
    plot(datam.df$X2, datam.df$X3, col=g.col, pch=".", xlim=c(-1,-0.5), ylim=c(-1,-.5), cex=3, main="g function")
    datam.df.g<-datam.df[g.fun==1,]
    ch.g<-chull(datam.df.g$X2, datam.df.g$X3 )
    ch.g <- rbind(x = datam.df.g[ch.g, ], datam.df.g[ch.g[1], ])
    polygon(ch.g, col=rgb(0,0,180,alpha=50, maxColorValue=255), lty=3, lwd=3)

ch.f 和 ch.g 对象是您的点周围“袋子”的坐标。您可以提取点来描述您的线条。

ch.f
lm.f<-lm(c(ch.f$X3[ ch.f$X2> -0.99 & ch.f$X2< -0.65 & ch.f$X3<0 ])~c(ch.f$X2[ ch.f$X2>-0.99 & ch.f$X2< -0.65 & ch.f$X3<0]))
curve(lm.f$coefficients[1]+x*lm.f$coefficients[2], from=-1., to=-0.59, lwd=5, add=T)
lm.g<-lm(c(ch.g$X3[ ch.g$X2> -0.99 & ch.g$X2< -0.65 & ch.g$X3<0 ])~c(ch.g$X2[ ch.g$X2>-0.99 & ch.g$X2< -0.65 & ch.g$X3<0]))
curve(lm.g$coefficients[1]+x*lm.g$coefficients[2], from=-1., to=-0.59, lwd=5, add=T, lty=3)

你得到了

不幸的是，因为 f 和 g 函数在您的示例中是相同的，所以您看不到上图中的不同行

【讨论】：

谢谢。你选择了 n = 10000，有那么多训练样本 f 和 g 应该是相同的，至少在图中是这样。训练数据是线性可分的，因为我是这样做的：首先生成 100 个数据点，然后选择一个线性函数将它们分开。第三，将它们分为两类，在线性函数之上和之下。有了感知算法，我现在想根据分类的训练集生成（现在未知的）线性函数。
图形看起来不错，但我可以让线条更线性，更像 abline() 吗？
当然。 ch.f 和 ch.g 对象是“包”的坐标。提取所需的点，对其进行建模和投影。查看我的更新答案

【解决方案3】：

您可以使用plot() 中的col 参数来指示f() 函数的分类。您可以使用polygon() 为您的g() 函数的分类区域添加阴影。如果你给我们一个可重现的例子，我们可以用特定的代码来回答。它会产生一个类似于您提供的 Mathematica 的图形。

【讨论】：