为什么我的逻辑回归中出现“权重 * y 错误：二元运算符的非数字参数”？答案

【问题标题】：Why am I getting 'Error in weights * y : non-numeric argument to binary operator' in my logistic regression?为什么我的逻辑回归中出现“权重 * y 错误：二元运算符的非数字参数”？
【发布时间】：2015-10-05 16:48:27
【问题描述】：

我愿意为我的数据集执行逻辑回归。我用：

glm.fit=glm(direccion~Profit, data=datos, family=binomial)

    Minute  ecopet  TASA10  direccion   Minute  cl1     Day         Profit  
1   571     2160     5       1          571    51.85    2015-02-20  -0.03   
2   572     2160     5       1          572    51.92    2015-02-20   0.04   
3   573     2160     5       1          573    51.84    2015-02-20  -0.04   
4   574     2160     5       1          574    51.77    2015-02-20  -0.11   
5   575     2160     10      1          575    51.69    2015-02-20  -0.19   
6   576     2165     5       1          576    51.69    2015-02-20  -0.16   
7   577     2165    -5       0          577    51.64    2015-02-20  -0.28   
8   578     2165    -10      0          578    51.47    2015-02-20  -0.37   
9   579     2165    -10      0          579    51.41    2015-02-20  -0.36   
10  580     2170    -15      0          580    51.44    2015-02-20  -0.25   
11  581     2170    -30      0          581    51.48    2015-02-20  -0.21   
12  582     2160    -20      0          582    51.52    2015-02-20  -0.12   
13  583     2155    -5       0          583    51.56    2015-02-20   0.09   
14  584     2155    -5       0          584    51.51    2015-02-20   0.10   
15  585     2155    -5       0          585    51.44    2015-02-20   0.00   
16  586     2140     10      1          586    51.30    2015-02-20  -0.18   
17  587     2140     10      1          587    51.31    2015-02-20  -0.21   
18  588     2150     0       0          588    51.31    2015-02-20  -0.25

如您所见，变量 'direccion' 是一个二元变量，是我的逻辑回归中的因变量。当变量“TASA10”为正时为 1，否则为 0。问题是我运行代码后得到：

'权重 * y 错误：二元运算符的非数字参数'

你知道这是为什么吗？

谢谢！！

【问题讨论】：

能否添加调用str(datos) 的结果，以便我们查看列的类型？这很可能是因为以某种方式获得了 character 值而不是 numeric 值。
似乎无法重现。你从方向得到什么类/类型？
你是对的！它是一个字符类型

标签： r logistic-regression

【解决方案1】：

direccion 列似乎是字符列而不是数字列。您可以通过运行str(datos) 进行验证；你会看到类似

'data.frame':   18 obs. of  8 variables:
 $ Minute   : int  571 572 573 574 575 576 577 578 579 580 ...
 $ ecopet   : int  2160 2160 2160 2160 2160 2165 2165 2165 2165 2170 ...
 $ TASA10   : int  5 5 5 5 10 5 -5 -10 -10 -15 ...
 $ direccion: chr  "1" "1" "1" "1" ...
 $ Minute.1 : int  571 572 573 574 575 576 577 578 579 580 ...
 $ cl1      : num  51.9 51.9 51.8 51.8 51.7 ...
 $ Day      : Factor w/ 1 level "2015-02-20": 1 1 1 1 1 1 1 1 1 1 ...
 $ Profit   : num  -0.03 0.04 -0.04 -0.11 -0.19 -0.16 -0.28 -0.37 -0.36 -0.25 ...

特别注意direccion 列的类型。这可以通过运行来修复

datos$direccion <- as.numeric(datos$direccion)

如果这是一个因素，那么你需要确保你不会因为使用而丢失编码

datos$direccion <- as.numeric(as.character(datos$direccion))

更好的是在您的管道中回顾生成此数据帧的代码并将其修复为编码为数字而不是字符串。

【讨论】：

【解决方案2】：

glm() 只接受numeric 或factor 类型的变量，它不知道如何处理character 类型的变量。

您可以创建一个简单的因式分解函数，将所有字符 (chr) 列转换为因数，同时保持数字列不变：

factorize = function(column, df){
  #' Check if column is character and  turn to factor

  if (class(df[1,column]) == "character"){
    out = as.factor(df[,column])
  } else { # if it's numeric
    out = df[,column]
  }
  return(out)
}

store.colnames = colnames(data)
data  = lapply(store.colnames, function(column) factorize(column, data))
data = as.data.frame(data)
colnames(data) = store.colnames

代码可能更漂亮，但它可以完成工作，我只是想说明这一点。

或者，您可以将单个列更改为因子类型：

datos$direccion = as.factor(datos$direccion)

希望有帮助！

【讨论】：