【发布时间】:2018-08-20 01:24:17
【问题描述】:
我有一个这样的数据框:
df<-structure(list(id = c("A", "A", "A", "B", "B", "C", "C", "D",
"D", "E", "E"), expertise = c("r", "python", "julia", "python",
"r", "python", "julia", "python", "julia", "r", "julia")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -11L), .Names = c("id",
"expertise"), spec = structure(list(cols = structure(list(id = structure(list(), class = c("collector_character",
"collector")), expertise = structure(list(), class = c("collector_character",
"collector"))), .Names = c("id", "expertise")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
df
id expertise
1 A r
2 A python
3 A julia
4 B python
5 B r
6 C python
7 C julia
8 D python
9 D julia
10 E r
11 E julia
我可以通过以下方式获得“专业知识”的总体计数:
library(dplyr)
df %>% group_by(expertise) %>% mutate (counts_overall= n())
但是,我想要的是专长值组合的计数。换句话说,有多少“id”具有相同的两种专业知识组合,例如“r”和“朱莉娅”? 这是所需的输出:
df_out<-structure(list(expertise1 = c("r", "r", "python"), expertise2 = c("python",
"julia", "julia"), count = c(2L, 2L, 3L)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L), .Names = c("expertise1",
"expertise2", "count"), spec = structure(list(cols = structure(list(
expertise1 = structure(list(), class = c("collector_character",
"collector")), expertise2 = structure(list(), class = c("collector_character",
"collector")), count = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("expertise1", "expertise2", "count"
)), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
df_out
expertise1 expertise2 count
1 r python 2
2 r julia 2
3 python julia 3
【问题讨论】:
-
我认为
crossprod(table(df)>0)的非对角线应该这样做。 -
@thelatemail 帖子作为答案?
-
@RonakShah - 寻找副本,因为我知道我从比我聪明的人那里偷了它!
-
为了得到同样想要的输出格式,我们可以展开@thelatemail 回答:
df_out <- crossprod(table(df)>0) %>% melt();colnames(df_out) <- c("exp1", "exp2", "count") ;df_out %>% filter(exp1 != exp2, count > 0) %>% arrange(desc(count));
标签: r dataframe combinations