【发布时间】:2023-10-29 16:08:01
【问题描述】:
我的目标是创建一个 igraph 图形对象,以后可以用它来绘制 ggraph。
我的整洁数据是包含不同数量项目的发票。 n 是原始样本中恰好一张发票的出现次数。例如,在以下包含面包、黄油和鸡蛋的发票类型 1 中,开具了 10 次发票。
#library(tidyverse)
data <- tibble(invoicetype = c(1,1,1,2,2,3,3,4,4,4,4,4,5,5,6,7,7,8,8,8,9,9),
item = c("bread", "butter", "eggs", "bread", "coke", "coke", "eggs",
"bread", "butter","coke", "pasta", "water", "coke", "water",
"coke", "bread", "butter", "eggs", "coke", "water", "pasta",
"bread"),
n = c(10,10,10,8,8,7,7,4,4,4,4,4,3,3,3,2,2,1,1,1,1,1))
我想创建一个 igraph 对象,该对象会考虑每个项目在同一张发票上与任何其他项目组合的次数。
问题:有没有简单的方法可以做到这一点?
我的繁琐解决方案:
以下是我提出的解决方案,但并不优雅,不适用于我的实际(大)数据。
data_spreaded <- data %>% group_by(invoicetype, n) %>%
summarise(item1 = item[1], item2 = item[2], item3 = item[3],
item4 = item[4], item5 = item[5])
combinations <- tibble()
for (g in 1:nrow(data_spreaded)) {
for (i in 3:ncol(data_spreaded)) {
for (j in 3:ncol(data_spreaded)) {
if (i == j) { next }
combinations <-
bind_rows(combinations,
tibble(from = data_spreaded[g,i] %>% pull(),
to = data_spreaded[g,j] %>% pull(),
invoicetype = data_spreaded[g,1] %>% pull(),
n = data_spreaded[g,2]%>% pull()))
}
}
}
combinations <- combinations %>%
distinct() %>% # remove the double counted
filter(!is.na(from), !is.na(to)) %>% # remove empty combinations
group_by(from, to) %>%
summarise(n = sum(n)) %>%
ungroup()
#library(igraph)
g <- graph_from_data_frame(combinations, directed = F)
要使用 ggraph 绘图,我使用:
E(g)$weight <- combinations$n
#library(ggraph)
set.seed(123)
ggraph(g, layout = "with_kk") +
geom_node_point() +
geom_node_text(aes(label = name), repel = T) +
geom_edge_link(aes(color = weight, label = n))
【问题讨论】: