【问题标题】:give each id the same column value R给每个 id 相同的列值 R
【发布时间】:2018-08-24 05:04:12
【问题描述】:

我想根据水果=='apple' 的 first.date 为每个唯一 ID 赋予 first.date 相同的列值。

这就是我所拥有的:

 names      dates  fruit first.date
1   john 2010-07-01   kiwi       <NA>
2   john 2010-09-01  apple 2010-09-01
3   john 2010-11-01 banana       <NA>
4   john 2010-12-01 orange       <NA>
5   john 2011-01-01  apple 2010-09-01
6   mary 2010-05-01 orange       <NA>
7   mary 2010-07-01  apple 2010-07-01
8   mary 2010-07-01 orange       <NA>
9   mary 2010-09-01  apple 2010-07-01
10  mary 2010-11-01  apple 2010-07-01

这就是我想要的:

 names      dates  fruit first.date
1   john 2010-07-01   kiwi 2010-09-01
2   john 2010-09-01  apple 2010-09-01
3   john 2010-11-01 banana 2010-09-01
4   john 2010-12-01 orange 2010-09-01
5   john 2011-01-01  apple 2010-09-01
6   mary 2010-05-01 orange 2010-07-01
7   mary 2010-07-01  apple 2010-07-01
8   mary 2010-07-01 orange 2010-07-01
9   mary 2010-09-01  apple 2010-07-01
10  mary 2010-11-01  apple 2010-07-01

这是我的灾难性尝试:

getdates$first.date[is.na]<-getdates[getdates$first.date & getdates$fruit=='apple',]

提前谢谢你

可重现的 DF

names<-as.character(c("john", "john", "john", "john", "john", "mary", "mary","mary","mary","mary"))
dates<-as.Date(c("2010-07-01",  "2010-09-01", "2010-11-01", "2010-12-01", "2011-01-01", "2010-05-01", "2010-07-01", "2010-07-01",  "2010-09-01",  "2010-11-01"))
fruit<-as.character(c("kiwi","apple","banana","orange","apple","orange","apple","orange", "apple", "apple")) 
first.date<-as.Date(c(NA, "2010-09-01",NA,NA, "2010-09-01", NA, "2010-07-01", NA, "2010-07-01","2010-07-01"))
getdates<-data.frame(names,dates,fruit, first.date)

【问题讨论】:

  • 请正确格式化您的问题。一个人什么都看不懂!

标签: r


【解决方案1】:

first.dateapple(对于给定名称)有重复条目时,不清楚您要做什么,这将只取第一个:

library(data.table)
dt = data.table(getdates)

dt[, first.date := first.date[fruit == 'apple'][1], by = names]
dt
#    names      dates  fruit first.date
# 1:  john 2010-07-01   kiwi 2010-09-01
# 2:  john 2010-09-01  apple 2010-09-01
# 3:  john 2010-11-01 banana 2010-09-01
# 4:  john 2010-12-01 orange 2010-09-01
# 5:  john 2011-01-01  apple 2010-09-01
# 6:  mary 2010-05-01 orange 2010-07-01
# 7:  mary 2010-07-01  apple 2010-07-01
# 8:  mary 2010-07-01 orange 2010-07-01
# 9:  mary 2010-09-01  apple 2010-07-01
#10:  mary 2010-11-01  apple 2010-07-01

【讨论】:

  • 嗨 eddi - first.date 值是一个人第一次得到苹果,副本确保第一次日期不会被他们下一次得到苹果的日期覆盖。你的代码似乎工作得很好,因为它会做我想要的:用每个 id 的苹果的第一个日期填充一列。谢谢!
  • 如果有多个组并且每组有很多条目,则此可能由于每组的矢量扫描而性能不佳。也许这样更好:DT &lt;- data.table(getdates); setkey(DT, names, fruit); dd &lt;- DT[J(unique(names), "apple"), mult="first"]$dates; DT[, first.date := dd[.GRP], by=names]。也就是说,如果 OP 不介意重新排序行。
  • 嗨 Arun,我将有大约 6,000 个人,每个人至少有 18 行......所以你的方法可能有好处。将尝试并报告。谢谢
  • @user2363642,我不认为这个(我写的那个)对于你提到的数据维度会更快......无论如何最好进行基准测试:)。
猜你喜欢
  • 2020-06-11
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-03-15
  • 1970-01-01
  • 2021-08-20
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多