【问题标题】:Creating a count column [duplicate]创建计数列[重复]
【发布时间】:2020-02-01 19:55:32
【问题描述】:

我在 R 中有一个这样的数据框:

  ID   REGION  FACTOR  
  01    north    1
  02    north    1
  03    north    0
  04    south    1
  05    south    1
  06    south    1
  07    south    0
  08    south    0

我想创建一个列,其中包含“区域”的行数并按某个因素 (factor==1) 过滤。

我知道如何计算这些值,但我找不到具有此输出的函数:

  ID   REGION  FACTOR  COUNT
  01    north     1      2
  02    north     1      2
  03    north     0      2
  04    south     1      3
  05    south     1      3
  06    south     1      3
  07    south     0      3 
  08    south     0      3

有人可以帮我吗?

【问题讨论】:

    标签: r counter


    【解决方案1】:

    我们可以使用add_count

    library(dplyr)
    df1 %>%
        add_count(REGION)
    

    如果是sumFACTOR

    df1 %>%
       group_by(REGION) %>%
       mutate(COUNT = sum(FACTOR))
       #or use
       # mutate(COUNT = sum(FACTOR != 0))
    # A tibble: 8 x 4
    # Groups:   REGION [2]
    #     ID REGION FACTOR COUNT
    #  <int> <chr>   <int> <int>
    #1     1 north       1     2
    #2     2 north       1     2
    #3     3 north       0     2
    #4     4 south       1     3
    #5     5 south       1     3
    #6     6 south       1     3
    #7     7 south       0     3
    #8     8 south       0     3
    

    或者使用`data.table

    library(data.table)
    setDT(df1)[, COUNT := sum(FACTOR), by = REGION]
    

    数据

    df1 <- structure(list(ID = 1:8, REGION = c("north", "north", "north", 
    "south", "south", "south", "south", "south"), FACTOR = c(1L, 
    1L, 0L, 1L, 1L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
    -8L))
    

    【讨论】:

      【解决方案2】:

      使用ave 的基本 R 解决方案,即:,

      dfout <- within(df, COUNT <- ave(FACTOR,REGION, FUN = sum))
      

      这样

      > dfout
        ID REGION FACTOR COUNT
      1  1  north      1     2
      2  2  north      1     2
      3  3  north      0     2
      4  4  south      1     3
      5  5  south      1     3
      6  6  south      1     3
      7  7  south      0     3
      8  8  south      0     3
      

      数据

      df <- structure(list(ID = 1:8, REGION = c("north", "north", "north", 
      "south", "south", "south", "south", "south"), FACTOR = c(1L, 
      1L, 0L, 1L, 1L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
      -8L))
      

      【讨论】:

        【解决方案3】:

        group_by 区域,然后创建(mutate)一个名为 count 的新列,它是每组观察值的总和,n()

        library(tidyverse)
        
        group_by(df, region) %>%
          mutate(count = n()) %>%
          ungroup()
        

        您想在最后ungroup(),这样以后的计算就不会在分组级别发生。

        【讨论】:

          猜你喜欢
          • 2016-05-09
          • 2020-03-22
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2020-08-31
          • 2015-11-19
          • 1970-01-01
          • 2020-05-25
          相关资源
          最近更新 更多