【问题标题】:How to pass variable name into R function如何将变量名传递给R函数
【发布时间】:2014-12-18 02:19:57
【问题描述】:

我知道这已经在不同的场景中解释了几次,但我很难消化这一点。

只要不在函数内部,以下代码就可以正常工作(见下文)。

df$season <- as.character(df$season) 
      temp <- model.matrix( ~ season - 1, data=df) 
      df <- cbind(df,temp)

之前:

head(df[c(1,2)])
    datetime season
1 2011-01-01      1
2 2011-01-01      1
3 2011-01-01      1
4 2011-01-01      1
5 2011-01-01      1
6 2011-01-01      1

之后:

> head(df[c(1,2,13:16)])
    datetime season season1 season2 season3 season4
1 2011-01-01      1       1       0       0       0
2 2011-01-01      1       1       0       0       0
3 2011-01-01      1       1       0       0       0
4 2011-01-01      1       1       0       0       0
5 2011-01-01      1       1       0       0       0
6 2011-01-01      1       1       0       0       0

但是,当我尝试将其包装在一个多用途函数中时:

binarize <- function(data, myvar) { 
  data$myvar <- as.character(data$myvar) 
  temp <- model.matrix( ~ myvar - 1, data=data) 
  data <- cbind(data,temp) 
} 

它抛出一个错误,毫无疑问是因为它无法评估 myvar 或 data(或两者?): $&lt;-.data.frame(*tmp*, "myvar", value = character(0)) 中的错误: 替换有0行,数据有10886

我尝试过使用 eval(substitute()) 进行试验,但仍然无法正常工作。我理想的最终状态是从数据框和变量开始,让函数将所选变量的所有值映射到单独的二进制列中,并将其附加到原始数据框。同样,当它不在函数中时,它也能完美运行。

如果它有助于重现,这里是 dput 数据。

> dput(head(df,50))
structure(list(datetime = structure(c(14975, 14975, 14975, 14975, 
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 
14975, 14975, 14976, 14976, 14976, 14976, 14976, 14976, 14976, 
14976, 14976, 14976, 14976, 14976, 14976, 14976, 14976, 14976, 
14976, 14976, 14976, 14976, 14976, 14976, 14976, 14977, 14977, 
14977), class = "Date"), season = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), holiday = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), workingday = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 1L, 1L), weather = c(1L, 1L, 1L, 1L, 1L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), temp = c(9.84, 
9.02, 9.02, 9.84, 9.84, 9.84, 9.02, 8.2, 9.84, 13.12, 15.58, 
14.76, 17.22, 18.86, 18.86, 18.04, 17.22, 18.04, 17.22, 17.22, 
16.4, 16.4, 16.4, 18.86, 18.86, 18.04, 17.22, 18.86, 18.86, 17.22, 
16.4, 16.4, 15.58, 14.76, 14.76, 14.76, 14.76, 14.76, 13.94, 
13.94, 13.94, 14.76, 13.12, 12.3, 10.66, 9.84, 9.02, 9.02, 8.2, 
6.56), atemp = c(14.395, 13.635, 13.635, 14.395, 14.395, 12.88, 
13.635, 12.88, 14.395, 17.425, 19.695, 16.665, 21.21, 22.725, 
22.725, 21.97, 21.21, 21.97, 21.21, 21.21, 20.455, 20.455, 20.455, 
22.725, 22.725, 21.97, 21.21, 22.725, 22.725, 21.21, 20.455, 
20.455, 19.695, 17.425, 16.665, 16.665, 17.425, 17.425, 16.665, 
16.665, 16.665, 16.665, 14.395, 13.635, 11.365, 10.605, 11.365, 
9.85, 8.335, 6.82), humidity = c(81L, 80L, 80L, 75L, 75L, 75L, 
80L, 86L, 75L, 76L, 76L, 81L, 77L, 72L, 72L, 77L, 82L, 82L, 88L, 
88L, 87L, 87L, 94L, 88L, 88L, 94L, 100L, 94L, 94L, 77L, 76L, 
71L, 76L, 81L, 71L, 66L, 66L, 76L, 81L, 71L, 57L, 46L, 42L, 39L, 
44L, 44L, 47L, 44L, 44L, 47L), windspeed = c(0, 0, 0, 0, 0, 6.0032, 
0, 0, 0, 0, 16.9979, 19.0012, 19.0012, 19.9995, 19.0012, 19.9995, 
19.9995, 19.0012, 16.9979, 16.9979, 16.9979, 12.998, 15.0013, 
19.9995, 19.9995, 16.9979, 19.0012, 12.998, 12.998, 19.9995, 
12.998, 15.0013, 15.0013, 15.0013, 16.9979, 19.9995, 8.9981, 
12.998, 11.0014, 11.0014, 12.998, 22.0028, 30.0026, 23.9994, 
22.0028, 19.9995, 11.0014, 23.9994, 27.9993, 26.0027), casual = c(3L, 
8L, 5L, 3L, 0L, 0L, 2L, 1L, 1L, 8L, 12L, 26L, 29L, 47L, 35L, 
40L, 41L, 15L, 9L, 6L, 11L, 3L, 11L, 15L, 4L, 1L, 1L, 2L, 2L, 
0L, 0L, 0L, 1L, 7L, 16L, 20L, 11L, 4L, 19L, 9L, 7L, 10L, 1L, 
5L, 11L, 0L, 0L, 0L, 0L, 0L), registered = c(13L, 32L, 27L, 10L, 
1L, 1L, 0L, 2L, 7L, 6L, 24L, 30L, 55L, 47L, 71L, 70L, 52L, 52L, 
26L, 31L, 25L, 31L, 17L, 24L, 13L, 16L, 8L, 4L, 1L, 2L, 1L, 8L, 
19L, 46L, 54L, 73L, 64L, 55L, 55L, 67L, 58L, 43L, 29L, 17L, 20L, 
9L, 8L, 5L, 2L, 1L), count = c(16L, 40L, 32L, 13L, 1L, 1L, 2L, 
3L, 8L, 14L, 36L, 56L, 84L, 94L, 106L, 110L, 93L, 67L, 35L, 37L, 
36L, 34L, 28L, 39L, 17L, 17L, 9L, 6L, 3L, 2L, 1L, 8L, 20L, 53L, 
70L, 93L, 75L, 59L, 74L, 76L, 65L, 53L, 30L, 22L, 31L, 9L, 8L, 
5L, 2L, 1L)), .Names = c("datetime", "season", "holiday", "workingday", 
"weather", "temp", "atemp", "humidity", "windspeed", "casual", 
"registered", "count"), row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", 
"17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", 
"28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", 
"39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", 
"50"), class = "data.frame")

提前感谢您的帮助!

【问题讨论】:

  • 你应该在函数内部使用data[["myvar"]]而不是data$myvar
  • 我试过这个:binarize &lt;- function(data, myvar) { data$[["myvar"]] &lt;- as.character(data$[["myvar"]]) temp &lt;- model.matrix( ~ myvar - 1, data=data) data &lt;- cbind(data,temp) } 但是我得到错误:错误:“}”中的意外'}'

标签: r function variables


【解决方案1】:

你真的不应该在函数内部使用data$columndata[[column]] 更安全。但我认为您可以将功能更改为这样的东西。不确定它是否给你正确的结果,但它完成了评估。

binarize <- function(data, myvar) { 
    form <- substitute( ~ x - 1, list(x = as.name(myvar))) 
    temp <- model.matrix(eval(form), data = data) 
    cbind(data, temp)
}

甚至更简单的可能是

binarize <- function(data, myvar) {
    form <- as.formula(paste("~", myvar, "- 1"))
    temp <- model.matrix(form, data = data) 
    cbind(data, temp)
}

在任何一种情况下,都需要使用 myvar 变量的字符串调用函数,即

binarize(df, "season")

【讨论】:

  • 谢谢理查德。以下允许它完成:binarize &lt;- function(data, myvar) { form &lt;- as.formula(paste("~", myvar, "- 1")) temp &lt;- model.matrix(form, data = data) data &lt;- cbind(data, temp) } 但是 df 保持不变。注意我还添加了data &lt;- cbind(data, temp)
  • 那是因为您没有以这种方式返回结果。如果最后一行是赋值,则函数不会返回值。这就是我放弃data &lt;- 部分的原因
  • 对,但是当我按照您的建议离开 data &lt;- 时,df 不会更新。事实上,df$season 仍然是一个整数,其他列并没有附加到 df。
  • 对不起,我想我不明白你在做什么
猜你喜欢
  • 2011-02-26
  • 2013-10-08
  • 1970-01-01
  • 2022-01-11
  • 1970-01-01
  • 2017-07-21
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多