【问题标题】:Conditional subset of data frame by special condition特殊条件下数据帧的条件子集
【发布时间】:2018-08-21 20:31:37
【问题描述】:
 df1 <-
 data.frame(Sector=c("auto","auto","auto","industry","industry","industry"),
 Topic=c("1","2","3","3","5","5"), 
 Frequency=c(1,2,5,2,3,2))
 df1

 df2 <- 
 data.frame(Sector=c("auto","auto","auto"),
 Topic=c("1","2","3"), 
 Frequency=c(1,2,5))
 df2

我有上面的数据框 1 (df1),并且想要一个看起来像 df2 的条件子集。条件如下:

“如果相应扇区的至少一个观测值的频率大于 3,则应保留该扇区的所有观测值,如果不是,则应删除相应扇区的所有观测值。” 在上面的例子中,只剩下汽车行业的三个观察值,工业被丢弃了。

有人知道我可以通过什么条件实现目标子集吗?

【问题讨论】:

  • df1[df1$Sector %in% df1$Sector[df1$Frequency &gt; 3],]

标签: r dataframe conditional subset


【解决方案1】:

我们可以使用 中的group_byfilter 来实现这一点。

library(dplyr)

df2 <- df1 %>%
  group_by(Sector) %>%
  filter(any(Frequency > 3)) %>%
  ungroup()
df2
# # A tibble: 3 x 3
#   Sector Topic Frequency
#   <fct>  <fct>     <dbl>
# 1 auto   1            1.
# 2 auto   2            2.
# 3 auto   3            5.

【讨论】:

    【解决方案2】:

    这是一个以R为基础的解决方案:

    df1 <-
      data.frame(Sector=c("auto","auto","auto","industry","industry","industry"),
                 Topic=c("1","2","3","3","5","5"), 
                 Frequency=c(1,2,5,2,3,2))
    subset(df1, ave(Frequency, Sector, FUN=max) >3)
    

    data.table 的解决方案:

    library("data.table")
    setDT(df1)[, if (max(Frequency)>3) .SD, by=Sector]
    

    【讨论】:

      猜你喜欢
      • 2015-05-23
      • 1970-01-01
      • 2020-07-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-06-14
      相关资源
      最近更新 更多