【发布时间】:2017-07-31 19:36:26
【问题描述】:
我有两个数据框,df.1 和 df.2,我想根据有关 df.1 的某些事情是否属实,从 df.2 中删除行。具体来说,我想从df.2 中删除所有行,其中与df.2 中的date 对应的feistiness 的df.1 值具有NA 值。如何做到这一点? (我已经查看了其他问题,但仍然无法弄清楚。)
第一个数据帧的可重现代码:
# create first data frame
dates <- rep(as.Date(5001:5010, origin = "1970-01-01"), times = 4)
dogs <- c(rep("Fido", times = 10), rep("Snoopy", times = 10), rep("Speckles", times = 10), rep("Pit", times = 10))
set.seed(200)
feistiness <- c(round(runif(35, min = 0, max = 100), digits = 0), rep(NA, times = 5))
df.1 <- data.frame(dates, dogs, feistiness)
names(df.1) <- c("date", "dog", "feistiness")
产量:
date dog feistiness
1 1983-09-11 Fido 56
2 1983-09-12 Fido 18
3 1983-09-13 Fido 97
4 1983-09-14 Fido 49
5 1983-09-15 Fido 49
6 1983-09-16 Fido 59
7 1983-09-17 Fido 72
8 1983-09-18 Fido 69
9 1983-09-19 Fido 18
10 1983-09-20 Fido 95
11 1983-09-11 Snoopy 69
12 1983-09-12 Snoopy 16
13 1983-09-13 Snoopy 58
14 1983-09-14 Snoopy 65
15 1983-09-15 Snoopy 83
16 1983-09-16 Snoopy 7
17 1983-09-17 Snoopy 12
18 1983-09-18 Snoopy 89
19 1983-09-19 Snoopy 56
20 1983-09-20 Snoopy 52
21 1983-09-11 Speckles 13
22 1983-09-12 Speckles 15
23 1983-09-13 Speckles 16
24 1983-09-14 Speckles 56
25 1983-09-15 Speckles 67
26 1983-09-16 Speckles 15
27 1983-09-17 Speckles 57
28 1983-09-18 Speckles 76
29 1983-09-19 Speckles 57
30 1983-09-20 Speckles 78
31 1983-09-11 Pit 68
32 1983-09-12 Pit 22
33 1983-09-13 Pit 28
34 1983-09-14 Pit 9
35 1983-09-15 Pit 59
36 1983-09-16 Pit NA
37 1983-09-17 Pit NA
38 1983-09-18 Pit NA
39 1983-09-19 Pit NA
40 1983-09-20 Pit NA
还有第二个数据框:
# create second data frame
dates.2 <- as.Date(c(5002, 5005, 5004, 5009), origin = "1970-01-01")
dogs.2 <- c("Fido", "Snoopy", "Speckles", "Pit")
df.2 <- data.frame(dates.2, dogs.2)
names(df.2) <- c("date", "dog")
产量:
date dog
1 1983-09-12 Fido
2 1983-09-15 Snoopy
3 1983-09-14 Speckles
4 1983-09-19 Pit
最终的输出数据框应如下所示,因为在 1983-09-19 时 Pitt 的 feistiness 值是 NA,所以删除了最后一行:
date dog
1 1983-09-12 Fido
2 1983-09-15 Snoopy
3 1983-09-14 Speckles
【问题讨论】:
-
na.omit(merge(df.2, df.1, by = c('date', 'dog')))请注意,您的df.1构造中有一个类型o。你写的“斑点”,应该是“斑点” -
在我的 R (3.4.1) 版本中,我需要在
as.Date调用中提供origin = 1970-01-01以使其可重现 -
感谢@bouncyball 和 jayelm - 我现在已经在帖子中解决了这两个问题。 bouncyball 的解决方案效果很好。