【发布时间】:2021-08-04 13:54:17
【问题描述】:
让表格如下:
| v1 | v2 | v3 |
|---|---|---|
| A | B | A |
| B | B | A |
| A | C | |
| D | C | D |
我希望 R 为每列的唯一值出现次数创建一个表:
| v1 | v2 | v3 |
|---|---|---|
| A | 1 | 1 |
| B | 1 | 2 |
| C | 0 | 1 |
| D | 1 | 0 |
【问题讨论】:
标签: r dataframe unique tabulate
让表格如下:
| v1 | v2 | v3 |
|---|---|---|
| A | B | A |
| B | B | A |
| A | C | |
| D | C | D |
我希望 R 为每列的唯一值出现次数创建一个表:
| v1 | v2 | v3 |
|---|---|---|
| A | 1 | 1 |
| B | 1 | 2 |
| C | 0 | 1 |
| D | 1 | 0 |
【问题讨论】:
标签: r dataframe unique tabulate
像这样尝试table
> table(unlist(df),names(df)[col(df)])
V1 v2 v3
A 1 1 2
B 1 2 0
C 0 1 1
D 1 0 1
> dput(df)
structure(list(V1 = c("A", "B", NA, "D"), v2 = c("B", "B", "A",
"C"), v3 = c("A", "A", "C", "D")), class = "data.frame", row.names = c(NA,
-4L))
【讨论】:
一个选项可能是:
sapply(df, function(x) table(factor(x, levels = unique(unlist(df)))))
V1 v2 v3
A 1 1 2
B 1 2 0
D 1 0 1
C 0 1 1
【讨论】:
要添加到集合中,需要一个 tidyverse 版本。
library(tidyverse)
df %>%
pivot_longer(
everything(),
values_to="Value",
names_to="Variable"
) %>%
group_by(Variable, Value) %>%
summarise(N=n(), .groups="drop") %>%
filter(!is.na(Value)) %>%
pivot_wider(values_from=N, names_from=Variable, values_fill=0) %>%
arrange(Value)
# A tibble: 4 x 4
Value v1 v2 v3
<chr> <int> <int> <int>
1 A 1 1 2
2 B 1 2 0
3 C 0 1 1
4 D 1 0 1
【讨论】:
为了完整起见,这里是一种结合使用melt()和dcast()的方法:
library(data.table)
dcast(melt(setDT(df1), measure.vars = patterns("^v"))[value != ""], value ~ variable)
value v1 v2 v3 1: A 1 1 2 2: B 1 2 0 3: C 0 1 1 4: D 1 0 1
该方法类似于Limey's answer,将数据从宽变长再变回宽,但不那么冗长。
从宽变长后可以调用table(),而不是dcast():
melt(setDT(df1), measure.vars = patterns("^v"))[value != ""][
, table(value, variable)]
variable value v1 v2 v3 A 1 1 2 B 1 2 0 C 0 1 1 D 1 0 1
注意这里使用了 data.table 链接。
而且,为了节省一些击键:
melt(setDT(df1), measure.vars = names(df1))[value != ""][, table(rev(.SD))]
df1 <- fread("
|v1|v2|v3|
|A |B | A|
|B |B | A|
| |A | C|
|D |C | D|",
drop = c(1,5), header = TRUE)
【讨论】:
我们可以使用mtabulate
library(qdapTools)
t(mtabulate(df))
V1 v2 v3
A 1 1 2
B 1 2 0
C 0 1 1
D 1 0 1
df <- structure(list(V1 = c("A", "B", NA, "D"), v2 = c("B", "B", "A",
"C"), v3 = c("A", "A", "C", "D")), class = "data.frame", row.names = c(NA,
-4L))
【讨论】: