【问题标题】:R data.table package - adding values in columns using := operatorR data.table 包 - 使用 := 运算符在列中添加值
【发布时间】:2015-07-06 11:16:01
【问题描述】:

问题

我有一个 data.frame,并希望根据其他列中的数据将数据放在一列中。

所以这是我的 data.frame 的一个例子(缩短版):

Fertilization=c("N0","N0","N0","N0","N2","N2","N2","N2")
Sowing=c("S1","S1","S2","S2","S1","S1","S2","S2")
FoliarRank=c("F2","F3","F2","F3","F2","F3","F2","F3")
New_FoliarRank=rep(0,length(Fertilization))
DT=data.frame(Fertilization,Sowing,FoliarRank,New_FoliarRank)

我想根据施肥、播种、FoliarRank 列中的条件为 New_FoliarRank 赋值。 例如:

  • 如果施肥=="N0"、播种=="S1" 和 FoliarRank=="F2",则 New_FoliarRank=="F3*"
  • 如果施肥=="N0"、播种=="S1" 和 FoliarRank=="F3",则 New_FoliarRank=="F2*"

至于解决方案:

  • 我可以让它与一堆 for/if 一起工作,但它会很慢而且不会 非常“R-ish”,也许即使我“应用”它
  • 据我了解,我可以使用 := 运算符 {data.table} 包。它可能会好得多。其实已经 在 Stack Overflow "Replace a numerical value by NA based on conditions from other columns" 的其他地方讨论过,但我找不到让这篇文章的解决方案发挥作用的方法。而且我不明白为什么,即使看着 ?":=".我遗漏了一些东西,也许很明显,所以我想我可以问一下。抱歉重复了。

我尝试过的一些解决方案:

library(data.table)
DT[Fertilization=="N0" & Sowing=="S1" & FoliarRank=="F2", New_FoliarRank:="F3*"] # seems to be same script as other post
DT[ , New_FoliarRank:= {Fertilization=="N0" & Sowing=="S1" & FoliarRank=="F2"; "F3*"}] # adapted from another post; doesn't work either

它给了我回报:

Error in `:=`(New_FoliarRank, "F3*") : 
Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").

建议的解决方案(下面发布了另一个解决方案)

# Initial vectors (no need for New_FoliarRank)
Fertilization=c("N0","N0","N0","N0","N2","N2","N2","N2")
Sowing=c("S1","S1","S2","S2","S1","S1","S2","S2")
FoliarRank=c("F2","F3","F2","F3","F2","F3","F2","F3")

# Actually I was missing the class of DT (data.table instead of data.frame)
DT=data.table(Fertilization,Sowing,FoliarRank)

library(data.table)
# And I shouldn't have created New_FoliarRank (esp. in with numerical values), as it is created "on the spot"
setDT(DT)[Fertilization=="N0" & Sowing=="S1" & FoliarRank=="F2", New_FoliarRank := "F3*"]
setDT(DT)[Fertilization=="N0" & Sowing=="S1" & FoliarRank=="F3", New_FoliarRank := "F2*"]

【问题讨论】:

  • @akrun:不,有两个以上的条件(实际上完整的data.frame中有28个,我无法在线发布)。在脚本结束时,New_FoliarRank 中将不再有零(取而代之的是新值:F1*、F2*、F3* 等)。
  • 在运行代码之前不要创建New_FoliarRank,因为您将其设为数字​​列并尝试添加字符值。另外不要忘记将 DT 转换为 data.table 对象。如果您不创建该列,则可以按如下方式运行几行(每次使用不同的条件)setDT(DT)[Fertilization=="N0" & Sowing=="S1" & FoliarRank=="F2", New_FoliarRank := "w"],一切都会正常
  • @DavidArenburg 好吧,他们应该初始化列。只是到正确的班级。
  • @Roland 他们不必这样做。但他们可以。我觉得主要还是看自己的喜好。例如,我从不这样做。
  • @大家:谢谢你的时间。我使用 as.data.table() 更改了 DT 的类,并用“a”初始化了 New_FoliarRank,其中包含零,即 New_FoliarRank=rep(“a”,length(Fertilization))。它成功了!

标签: r dataframe data.table


【解决方案1】:

你可以使用因子:

library(data.table)
setDT(DT)
DT[, New_FoliarRank := interaction(Fertilization, Sowing, FoliarRank)]
#check levels
levels(DT[, New_FoliarRank])

#assign new labels
DT[, New_FoliarRank := factor(New_FoliarRank, 
                              levels = levels(New_FoliarRank),
                              labels = c("012", "212", "022", "222", "013", "213", "023", "223"))]

#   Fertilization Sowing FoliarRank New_FoliarRank
#1:            N0     S1         F2            012
#2:            N0     S1         F3            013
#3:            N0     S2         F2            022
#4:            N0     S2         F3            023
#5:            N2     S1         F2            212
#6:            N2     S1         F3            213
#7:            N2     S2         F2            222
#8:            N2     S2         F3            223

【讨论】:

  • @Roland:谢谢,这也是一个很好的答案。我将在我的原始帖子中发布上面提出的解决方案,并将您的帖子标记为解决主题的答案。
猜你喜欢
  • 1970-01-01
  • 2012-07-20
  • 2020-11-16
  • 2017-06-22
  • 1970-01-01
  • 2018-08-07
  • 2017-02-23
  • 2011-10-25
相关资源
最近更新 更多