【问题标题】:Count maximum consecutive repeated non-NA values grouped by another variable in dataframe R计算数据帧 R 中由另一个变量分组的最大连续重复非 NA 值
【发布时间】:2021-07-29 10:09:28
【问题描述】:

我想确定每个 ADM2_PCODE 的连续重复非 NA Valor 值的最大计数。因此,想法是按ADM2_PCODE分组,过滤掉NA值,为每个Valor值计算连续案例的最高计数,并选择它们之间的最大出现次数。

下面的示例数据框:

df <- structure(list(Year = c(1981, 1982, 1983, 1984, 1985, 1986, 
                              1981, 1982, 1983, 1984, 1985, 1986,
                              1981, 1982, 1983, 1984, 1985, 1986), ADM2_PCODE = c(1100015, 1100015, 1100015, 1100015, 1100015, 1100015, 
                                                                                  1100016, 1100016, 1100016, 1100016, 1100016, 1100016,
                                                                                  1100017, 1100017, 1100017, 1100017, 1100017, 1100017), 
                     Valor = c(NA, NA, 30, 30, NA, NA,
                               90, 10, 90, 10, 10, 10,
                               30, 20, 30, 40, 30, 60), geometry = c("MULTIPOLYGON (((-62.0495 -1...",
                                                                     "MULTIPOLYGON (((-62.0495 -1...", "MULTIPOLYGON (((-62.0495 -1...",
                                                                     "MULTIPOLYGON (((-62.0495 -1...", "MULTIPOLYGON (((-62.0495 -1...",
                                                                     "MULTIPOLYGON (((-62.0495 -1...", "MULTIPOLYGON (((-63.0495 -1...",
                                                                     "MULTIPOLYGON (((-62.0495 -1...", "MULTIPOLYGON (((-62.0495 -1...",
                                                                     "MULTIPOLYGON (((-62.0495 -1...", "MULTIPOLYGON (((-62.0495 -1...",
                                                                     "MULTIPOLYGON (((-62.0495 -1...", "MULTIPOLYGON (((-63.0495 -1...",
                                                                     "MULTIPOLYGON (((-63.0495 -1...", "MULTIPOLYGON (((-63.0495 -1...",
                                                                     "MULTIPOLYGON (((-63.0495 -1...", "MULTIPOLYGON (((-63.0495 -1...",
                                                                     "MULTIPOLYGON (((-63.0495 -1...")), row.names = c(NA, -18L), class = c("tbl_df", "tbl", "data.frame"))

输入:

 df
# A tibble: 18 x 4
    Year ADM2_PCODE Valor geometry                      
   <dbl>      <dbl> <dbl> <chr>                         
 1  1981    1100015    NA MULTIPOLYGON (((-62.0495 -1...
 2  1982    1100015    NA MULTIPOLYGON (((-62.0495 -1...
 3  1983    1100015    30 MULTIPOLYGON (((-62.0495 -1...
 4  1984    1100015    30 MULTIPOLYGON (((-62.0495 -1...
 5  1985    1100015    NA MULTIPOLYGON (((-62.0495 -1...
 6  1986    1100015    NA MULTIPOLYGON (((-62.0495 -1...
 7  1981    1100016    90 MULTIPOLYGON (((-63.0495 -1...
 8  1982    1100016    10 MULTIPOLYGON (((-62.0495 -1...
 9  1983    1100016    90 MULTIPOLYGON (((-62.0495 -1...
10  1984    1100016    10 MULTIPOLYGON (((-62.0495 -1...
11  1985    1100016    10 MULTIPOLYGON (((-62.0495 -1...
12  1986    1100016    10 MULTIPOLYGON (((-62.0495 -1...
13  1981    1100017    10 MULTIPOLYGON (((-63.0495 -1...
14  1982    1100017    20 MULTIPOLYGON (((-63.0495 -1...
15  1983    1100017    30 MULTIPOLYGON (((-63.0495 -1...
16  1984    1100017    40 MULTIPOLYGON (((-63.0495 -1...
17  1985    1100017    50 MULTIPOLYGON (((-63.0495 -1...
18  1986    1100017    60 MULTIPOLYGON (((-63.0495 -1...

预期输出:

ADM2_PCODE max_consecutive_values 
       <dbl>  <lgl>    
1    1100015 2 
2    1100016 3 
3    1100017 1 

【问题讨论】:

  • 嗨罗纳克。有 4 个 10,但只有 3 个是连续的。这个想法是选择最连续重复的值,而不是总体上重复次数最多的值。有意义吗?
  • 混乱在我这边,对此我很抱歉。输入和生成它们的代码不相同,90 和 10 混在一起。现在我正确地编辑了它,1100016 应该读作:90, 10, 90, 10, 10, 10。因此,重复次数最多的连续值将是 10、3 次。现在清楚了吗?

标签: r dplyr count rle


【解决方案1】:

使用data.table rleid 来跟踪您可以执行的连续值 -

library(dplyr)
library(data.table)

df %>%
  filter(!is.na(Valor)) %>%
  group_by(ADM2_PCODE) %>%
  mutate(grp = rleid(Valor)) %>%
  count(grp) %>%
  summarise(max_consecutive_values = max(n))

#  ADM2_PCODE max_consecutive_values
#       <dbl>                  <int>
#1    1100015                      2
#2    1100016                      3
#3    1100017                      1

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-03-01
    • 1970-01-01
    • 2015-01-28
    • 2023-03-19
    • 2015-06-17
    • 1970-01-01
    • 2020-12-25
    相关资源
    最近更新 更多