创建计数列[重复]答案

【问题标题】：Creating a count column [duplicate]创建计数列[重复]
【发布时间】：2020-02-01 19:55:32
【问题描述】：

我在 R 中有一个这样的数据框：

  ID   REGION  FACTOR  
  01    north    1
  02    north    1
  03    north    0
  04    south    1
  05    south    1
  06    south    1
  07    south    0
  08    south    0

我想创建一个列，其中包含“区域”的行数并按某个因素 (factor==1) 过滤。

我知道如何计算这些值，但我找不到具有此输出的函数：

  ID   REGION  FACTOR  COUNT
  01    north     1      2
  02    north     1      2
  03    north     0      2
  04    south     1      3
  05    south     1      3
  06    south     1      3
  07    south     0      3 
  08    south     0      3

有人可以帮我吗？

【问题讨论】：

标签： r counter

【解决方案1】：

我们可以使用add_count

library(dplyr)
df1 %>%
    add_count(REGION)

如果是sumFACTOR

df1 %>%
   group_by(REGION) %>%
   mutate(COUNT = sum(FACTOR))
   #or use
   # mutate(COUNT = sum(FACTOR != 0))
# A tibble: 8 x 4
# Groups:   REGION [2]
#     ID REGION FACTOR COUNT
#  <int> <chr>   <int> <int>
#1     1 north       1     2
#2     2 north       1     2
#3     3 north       0     2
#4     4 south       1     3
#5     5 south       1     3
#6     6 south       1     3
#7     7 south       0     3
#8     8 south       0     3

或者使用`data.table

library(data.table)
setDT(df1)[, COUNT := sum(FACTOR), by = REGION]

数据

df1 <- structure(list(ID = 1:8, REGION = c("north", "north", "north", 
"south", "south", "south", "south", "south"), FACTOR = c(1L, 
1L, 0L, 1L, 1L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-8L))

【讨论】：

【解决方案2】：

使用ave 的基本 R 解决方案，即：，

dfout <- within(df, COUNT <- ave(FACTOR,REGION, FUN = sum))

这样

> dfout
  ID REGION FACTOR COUNT
1  1  north      1     2
2  2  north      1     2
3  3  north      0     2
4  4  south      1     3
5  5  south      1     3
6  6  south      1     3
7  7  south      0     3
8  8  south      0     3

数据

df <- structure(list(ID = 1:8, REGION = c("north", "north", "north", 
"south", "south", "south", "south", "south"), FACTOR = c(1L, 
1L, 0L, 1L, 1L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-8L))

【讨论】：

【解决方案3】：

group_by 区域，然后创建（mutate）一个名为 count 的新列，它是每组观察值的总和，n()：

library(tidyverse)

group_by(df, region) %>%
  mutate(count = n()) %>%
  ungroup()

您想在最后ungroup()，这样以后的计算就不会在分组级别发生。

【讨论】：