是否可以根据其绘图/预测曲线过滤数据？答案

【问题标题】：Is it possible to filter data based on its plot/predicted curve?是否可以根据其绘图/预测曲线过滤数据？
【发布时间】：2020-12-01 18:40:34
【问题描述】：

我有一个关于排除/过滤数据点的问题。我目前编写了一个逻辑回归，它生成一个决策边界，该决策边界被封装到一个函数中，我可以在该函数中运行我的数据帧的子集。

我想知道，如果我要根据这些结果绘制所有预测曲线，是否可以根据它们生成的图/曲线进一步过滤这些决策边界。或者如果可以设置要求以使曲线“合格”并跟踪数据框中的相应数据...

## glm that generates a midpoint/decision boundary, wrapped into a function

get_midpoint = function(data){
      glm.1 = glm(coderesponse~stimulus, family = binomial(link="logit"), data=data, na.action = na.exclude)
      rtn = -glm.1$coefficients[1]/glm.1$coefficients[2]
rtn
}

## a mini dummy dataframe 

subject <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
stimulus = c(1, 5, 50, 35, 23, 2, 4, 22, 15, 6, 20, 40, 45, 10, 37, 43, 48, 7, 19, 21, 29, 49, 26, 11, 36, 30, 39, 41, 16, 37, 1, 5, 50, 35, 23, 2, 4, 22, 15, 6, 20, 40, 45, 10, 37, 43, 48, 7, 19, 21, 29, 49, 26, 11, 36, 30, 39, 41, 16, 37)
stim <- c('bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm')
block <- c('mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose', 'mouth', 'mouth', 'mouth', 'nose', 'nose', 'nose')
coderesponse <- c(1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0)

df = data.frame(subject, stimulus, stim, block, coderesponse)

## running the function over defined subgroups of ~80 rows each [for the real data]
## but for the dummy dataframe, only ~5 rows

df = df %>% 
  nest(data=-c(subject, stim, block)) %>%
  mutate(midpoint=map_dbl(data, get_midpoint)) %>%
  unnest()

## basic code that plots and creates a curve based on a single glm result
## QUESTION: want to be able to run this over the same subgroups as above to create curves for every midpoint generated and then possibly filter based on the curve?
plot(df$stimulus,df$coderesponse,xlab="stimulus",ylab="Probability of d responses")
curve(predict(glm.1,data.frame(stimulus=x),type="response"),add=TRUE)

我很陌生，对 R 的这一部分感到困惑，所以感谢您的帮助或见解！

【问题讨论】：

如果您包含一个简单的reproducible example，其中包含可用于测试和验证可能解决方案的示例输入和所需输出，则更容易为您提供帮助。有点不清楚你在描述什么。
已编辑！希望它能澄清事情。
运行此程序时，我在最后一行收到object 'glm.1' not found 错误。我看到嵌入在mutate 中使用的get_midpoint() fx 中的模型，但您没有在任何地方返回glm.1 模型。
@Steven 啊抱歉，最后一段代码（例如关于plot 和curve 预测的行）仅适用于glm.1 模型的单个输出。我试图弄清楚如何修改它以输出与get_midpoint() 函数生成的值相对应的绘图和曲线，但我仍然遇到问题。编辑：是否可以使用 ggplot 从数据子组中绘制所有 glm.1 值，还是需要某种功能？
@LizJu 我仍然不确定我是否完全理解您在寻找什么。在我看来，您想将coderesponse~stimulus 建模为glm，按subject 分组，然后将数据和每个模型绘制在同一个图上。如果是这样的话，很容易。 ggplot() 可以为您绘制模型。如果是别的，我错过了我理解的一个关键组成部分。

标签： r ggplot2 dplyr glm loess

【解决方案1】：

我认为您正在尝试执行以下操作：

library(ggplot2)
library(dplyr)

df %>%
  ggplot() +
  aes(x = stimulus, y = coderesponse, colour = subject %>% as.factor()) +
  geom_point() +
  geom_smooth(method = 'glm', method.args = list(family = binomial(link='logit')), se = F) +
  scale_colour_discrete(name = "Subject") +
  theme(legend.position = "bottom")

这将获取您的原始 df 并简单地绘制数据，用 subject 着色，然后在您的数据中的两个 subject 组上运行 glm 模型。如果您需要使用它们进行预测，您可以在 geom_smooth() 语句之外运行每个 glm。有一种方法可以使用 ggplot 生成的模型，而无需在重构时进行额外的计算。

【讨论】：