对于数据框中的每一列，计算一个数字（因子）在每个组中出现的次数答案

【问题标题】：count the number of times a number (factor) occurs within each group, for each column in the dataframe对于数据框中的每一列，计算一个数字（因子）在每个组中出现的次数
【发布时间】：2021-05-17 02:47:51
【问题描述】：

正如标题所示，我正在尝试扩展此处提出的问题：

count the number of times a number (factor) occurs within each group

但对于给定数据框中的所有列。下面是一个可重现的例子：

dat <- data.frame(Bin = rep(1:4, each = 50), 
                  Number = sample(5, 200, replace = T, prob = c(1,1,1,2,3)),
                  Number2 = sample(5, 200, replace = T, prob = c(1,1,1,2,3)))


> head(dat)
  Bin Number Number2
1   1      4       2
2   1      5       5
3   1      4       4
4   1      4       1
5   1      5       5
6   1      5       3

我可以通过多个 dcast 来做到这一点。

dcast(dat, Bin ~ Number)
dcast(dat, Bin ~ Number2)

但是，我的实际数据框有更多列。任何帮助将不胜感激！

谢谢。

【问题讨论】：

标签： r

【解决方案1】：

获取长格式数据并使用count：

library(dplyr)
library(tidyr)

dat %>%
  pivot_longer(cols = starts_with('Number')) %>%
  count(Bin, name, value) %>%
  pivot_wider(names_from = name, values_from = n)

【讨论】：

非常感谢，罗纳克。我接受@Patricio Moracho 的答案而不是你的答案的原因是，虽然在示例中我确实用相同的前缀命名了其他列，但它实际上并没有发生在我的数据框中（可能是我的错误），但是，他们的答案确实这基于任何数据框，而不仅仅是我提供的示例。
我完全同意。我接受另一个答案的唯一原因是为了让其他人清楚。虽然这可能是我的疏忽给两列相同的前缀，但我觉得将最“普遍”的答案作为公认的答案对其他人来说仍然是最有益的。再次感谢您抽出宝贵时间提供答案

【解决方案2】：

从概念上讲，它与 Ronak Shah 的解决方案相同，但更简单一些。

library(tidyverse)

dat %>% 
  pivot_longer(-Bin) %>% 
  pivot_wider(names_from = value, values_fn = length, names_sort=TRUE)

# A tibble: 8 x 7
    Bin name      `1`   `2`   `3`   `4`   `5`
  <int> <chr>   <int> <int> <int> <int> <int>
1     1 Number     10     7     3    10    20
2     1 Number2    10     6     6     8    20
3     2 Number      2     7     6     8    27
4     2 Number2     2     5     8    13    22
5     3 Number      3     8    13    12    14
6     3 Number2     9     5     6     7    23
7     4 Number      9     6     7     3    25
8     4 Number2     2     7     8    19    14

【讨论】：

非常感谢帕特里西奥。效果很好，无论其他列名如何，都可以在任何数据框上工作。

【解决方案3】：

一种方法是使用 base-R 函数 tabulate...

dat %>% 
  group_by(Bin) %>%                              #group by bin
  summarise(across(everything(), tabulate)) %>%  #for everything else generate a tabulation
  mutate(no = row_number())                      #add numbers being tabulated

# A tibble: 20 x 4
# Groups:   Bin [4]
     Bin Number Number2    no
   <int>  <int>   <int> <int>
 1     1      7       3     1
 2     1      9       6     2
 3     1      5       8     3
 4     1      9      16     4
 5     1     20      17     5
 6     2      4       5     1
 7     2      5       4     2
 8     2      8      10     3
 9     2     11      13     4
10     2     22      18     5
11     3      6       6     1
12     3      6       9     2
13     3      7       5     3
14     3     11      13     4
15     3     20      17     5
16     4      3       7     1
17     4      3       6     2
18     4      7       4     3
19     4     19      12     4
20     4     18      21     5

因此，例如，bin 1，编号 4（最后一列）在 Number 中有 9 个，在 Number2 中有 16 个

【讨论】：

【解决方案4】：

使用基础 R，您可以使用 apply 系列函数和 table：

as.data.frame(apply(dat[, 2:ncol(dat)], 2, table))

【讨论】：

【解决方案5】：

我们可以从reshape2使用recast

library(reshape2)
recast(dat, id.var = 'Bin', Bin + variable ~ value, length)

【讨论】：