【问题标题】:R: Removing rows from data frame based on external criteriaR:根据外部标准从数据框中删除行
【发布时间】:2017-07-31 19:36:26
【问题描述】:

我有两个数据框,df.1df.2,我想根据有关 df.1 的某些事情是否属实,从 df.2 中删除行。具体来说,我想从df.2 中删除所有行,其中与df.2 中的date 对应的feistinessdf.1 值具有NA 值。如何做到这一点? (我已经查看了其他问题,但仍然无法弄清楚。)

第一个数据帧的可重现代码:

# create first data frame
dates <- rep(as.Date(5001:5010, origin = "1970-01-01"), times = 4)
dogs <- c(rep("Fido", times = 10), rep("Snoopy", times = 10), rep("Speckles", times = 10), rep("Pit", times = 10))
set.seed(200)
feistiness <- c(round(runif(35, min = 0, max = 100), digits = 0), rep(NA, times = 5))
df.1 <- data.frame(dates, dogs, feistiness)
names(df.1) <- c("date", "dog", "feistiness")

产量:

         date     dog feistiness
1  1983-09-11    Fido         56
2  1983-09-12    Fido         18
3  1983-09-13    Fido         97
4  1983-09-14    Fido         49
5  1983-09-15    Fido         49
6  1983-09-16    Fido         59
7  1983-09-17    Fido         72
8  1983-09-18    Fido         69
9  1983-09-19    Fido         18
10 1983-09-20    Fido         95
11 1983-09-11  Snoopy         69
12 1983-09-12  Snoopy         16
13 1983-09-13  Snoopy         58
14 1983-09-14  Snoopy         65
15 1983-09-15  Snoopy         83
16 1983-09-16  Snoopy          7
17 1983-09-17  Snoopy         12
18 1983-09-18  Snoopy         89
19 1983-09-19  Snoopy         56
20 1983-09-20  Snoopy         52
21 1983-09-11 Speckles         13
22 1983-09-12 Speckles         15
23 1983-09-13 Speckles         16
24 1983-09-14 Speckles         56
25 1983-09-15 Speckles         67
26 1983-09-16 Speckles         15
27 1983-09-17 Speckles         57
28 1983-09-18 Speckles         76
29 1983-09-19 Speckles         57
30 1983-09-20 Speckles         78
31 1983-09-11     Pit         68
32 1983-09-12     Pit         22
33 1983-09-13     Pit         28
34 1983-09-14     Pit          9
35 1983-09-15     Pit         59
36 1983-09-16     Pit         NA
37 1983-09-17     Pit         NA
38 1983-09-18     Pit         NA
39 1983-09-19     Pit         NA
40 1983-09-20     Pit         NA

还有第二个数据框:

# create second data frame
dates.2 <- as.Date(c(5002, 5005, 5004, 5009), origin = "1970-01-01")
dogs.2 <- c("Fido", "Snoopy", "Speckles", "Pit")
df.2 <- data.frame(dates.2, dogs.2)
names(df.2) <- c("date", "dog")

产量:

        date      dog
1 1983-09-12     Fido
2 1983-09-15   Snoopy
3 1983-09-14 Speckles
4 1983-09-19      Pit

最终的输出数据框应如下所示,因为在 1983-09-19 时 Pittfeistiness 值是 NA,所以删除了最后一行:

        date      dog
1 1983-09-12     Fido
2 1983-09-15   Snoopy
3 1983-09-14 Speckles

【问题讨论】:

  • na.omit(merge(df.2, df.1, by = c('date', 'dog'))) 请注意,您的df.1 构造中有一个类型o。你写的“斑点”,应该是“斑点”
  • 在我的 R (3.4.1) 版本中,我需要在 as.Date 调用中提供 origin = 1970-01-01 以使其可重现
  • 感谢@bouncyball 和 jayelm - 我现在已经在帖子中解决了这两个问题。 bouncyball 的解决方案效果很好。

标签: r dataframe


【解决方案1】:

我们可以从dplyr 使用anti_joindf_final 是最终输出。

library(dplyr)

df_final <- df.2 %>%
  anti_join(df.1 %>% filter(is.na(feistiness)), by = c("date", "dog"))

【讨论】:

    猜你喜欢
    • 2022-06-28
    • 1970-01-01
    • 2022-01-07
    • 1970-01-01
    • 2011-12-16
    • 1970-01-01
    • 1970-01-01
    • 2022-08-03
    • 2018-06-04
    相关资源
    最近更新 更多