【问题标题】:Count number of rows that are not NA [duplicate]计算非 NA 的行数 [重复]
【发布时间】:2021-07-14 21:17:20
【问题描述】:

所以我有一个看起来像这样的数据框:

"date","id_station","id_parameter","valor","unit","year","day","month","hour","zona"
2019-01-01 00:00:00,"AJM","CO",NA,15,2019,1,1,0,"SO"
2019-01-01 00:00:00,"ATI","CO",NA,15,2019,1,1,0,"NO"
2019-01-01 00:00:00,"BJU","CO",NA,15,2019,1,1,0,"CE"
2019-01-01 00:00:00,"CAM","CO",NA,15,2019,1,1,0,"NO"
2019-01-01 00:00:00,"CCA","CO",NA,15,2019,1,1,0,"SO"
2019-01-01 00:00:00,"CHO","CO",NA,15,2019,1,1,0,"SE"
2019-01-01 00:00:00,"CUA","CO",NA,15,2019,1,1,0,"SO"
2019-01-01 00:00:00,"FAC","CO",NA,15,2019,1,1,0,"NO"
2019-01-01 00:00:00,"HGM","CO",NA,15,2019,1,1,0,"CE"
2019-01-01 00:00:00,"IZT","CO",NA,15,2019,1,1,0,"CE"
2019-01-01 00:00:00,"LLA","CO",NA,15,2019,1,1,0,"NE"
2019-01-01 00:00:00,"LPR","CO",NA,15,2019,1,1,0,"NE"
2019-01-01 00:00:00,"MER","CO",NA,15,2019,1,1,0,"CE"
2019-01-01 00:00:00,"MGH","CO",NA,15,2019,1,1,0,"SO"
2019-01-01 00:00:00,"NEZ","CO",NA,15,2019,1,1,0,"NE"
2019-01-01 00:00:00,"PED","CO",NA,15,2019,1,1,0,"SO"
2019-01-01 00:00:00,"SAG","CO",NA,15,2019,1,1,0,"NE"
2019-01-01 00:00:00,"SFE","CO",NA,15,2019,1,1,0,"SO"
2019-01-01 00:00:00,"SJA","CO",NA,15,2019,1,1,0,"NO"
2019-01-01 00:00:00,"TAH","CO",NA,15,2019,1,1,0,"SE"
2019-01-01 00:00:00,"TLA","CO",NA,15,2019,1,1,0,"NO"
2019-01-01 00:00:00,"TLI","CO",NA,15,2019,1,1,0,"NO"
2019-01-01 00:00:00,"UAX","CO",NA,15,2019,1,1,0,"SE"
2019-01-01 00:00:00,"UIZ","CO",NA,15,2019,1,1,0,"SE"
2019-01-01 00:00:00,"VIF","CO",NA,15,2019,1,1,0,"NE"
2019-01-01 00:00:00,"XAL","CO",NA,15,2019,1,1,0,"NE"
2019-01-01 01:00:00,"AJM","CO",NA,15,2019,1,1,1,"SO"
2019-01-01 01:00:00,"ATI","CO",NA,15,2019,1,1,1,"NO"
2019-01-01 01:00:00,"BJU","CO",NA,15,2019,1,1,1,"CE"
2019-01-01 01:00:00,"CAM","CO",NA,15,2019,1,1,1,"NO"
2019-01-01 01:00:00,"CCA","CO",NA,15,2019,1,1,1,"SO"
2019-01-01 01:00:00,"CHO","CO",NA,15,2019,1,1,1,"SE"
2019-01-01 01:00:00,"CUA","CO",NA,15,2019,1,1,1,"SO"
2019-01-01 01:00:00,"FAC","CO",NA,15,2019,1,1,1,"NO"
2019-01-01 01:00:00,"HGM","CO",NA,15,2019,1,1,1,"CE"
2019-01-01 01:00:00,"IZT","CO",NA,15,2019,1,1,1,"CE"
2019-01-01 01:00:00,"LLA","CO",NA,15,2019,1,1,1,"NE"
2019-01-01 01:00:00,"LPR","CO",NA,15,2019,1,1,1,"NE"
2019-01-01 01:00:00,"MER","CO",NA,15,2019,1,1,1,"CE"
2019-01-01 01:00:00,"MGH","CO",NA,15,2019,1,1,1,"SO"
2019-01-01 01:00:00,"NEZ","CO",NA,15,2019,1,1,1,"NE"
2019-01-01 01:00:00,"PED","CO",NA,15,2019,1,1,1,"SO"
2019-01-01 01:00:00,"SAG","CO",NA,15,2019,1,1,1,"NE"
2019-01-01 01:00:00,"SFE","CO",NA,15,2019,1,1,1,"SO"
2019-01-01 01:00:00,"SJA","CO",NA,15,2019,1,1,1,"NO"
2019-01-01 01:00:00,"TAH","CO",NA,15,2019,1,1,1,"SE"
2019-01-01 01:00:00,"TLA","CO",NA,15,2019,1,1,1,"NO"
2019-01-01 01:00:00,"TLI","CO",NA,15,2019,1,1,1,"NO"
2019-01-01 01:00:00,"UAX","CO",NA,15,2019,1,1,1,"SE"
2019-01-01 01:00:00,"UIZ","CO",NA,15,2019,1,1,1,"SE"
2019-01-01 01:00:00,"VIF","CO",NA,15,2019,1,1,1,"NE"
2019-01-01 01:00:00,"XAL","CO",NA,15,2019,1,1,1,"NE"

我想做的是根据 id_station、id_parameter、年、日和月对所有内容进行分组。之后,我想统计每天“valor”中不为 NA 的行数。

最后,我想确定每个 id_station 的每一天有多少天至少有 18 个非 NA 值。如果少于 274 天,我想删除与该 id_station 关联的所有值

我该怎么做?

【问题讨论】:

  • 如果您有新问题,请使用提问按钮。不要向现有问题添加新要求。请参阅here 了解更多信息。

标签: r dataframe count rows


【解决方案1】:

另一个可能的选择可能是

aggregate(
    cbind(Count = !is.na(valor)) ~id_station + id_parameter + year + day + month,
    df,
    sum
)

【讨论】:

    【解决方案2】:

    按感兴趣的列分组后,获取逻辑向量的sum 作为计数,即-is.na(valor) 返回一个逻辑向量,其中有 NA,非 NA 为 FALSE,取反 (!)反转它并获得逻辑的sum,例如每个TRUE(-> 1)代表一个非NA元素

    library(dplyr)
    df1 %>%
        group_by(id_station, id_parameter, year, day, month) %>%
        summarise(Count = sum(!is.na(valor)))
    

    【讨论】:

      猜你喜欢
      • 2017-11-01
      • 2016-10-14
      • 2012-12-31
      • 1970-01-01
      • 1970-01-01
      • 2017-06-10
      • 1970-01-01
      • 2018-06-17
      相关资源
      最近更新 更多