【发布时间】:2021-06-20 05:47:59
【问题描述】:
我有以下数据集,显示每个产品中包含的成分;
data <- data.frame("PRODUCT" = c("Creme","Creme","Creme","Creme","Medoc","Medoc","Medoc","Medoc","Medoc","Hububu","Hububu","Hububu","Hububu","Troll","Troll","Troll","Troll","Suzuki","Suzuki","Gluglu","Gluglu","Gluglu"),
"INGREDIENT" = c("zeze","zaza","zozo","zuzu","zaza","sasa","haha","zuzu","zemzem","zaza","zuzu","zizi","haha","zozo","zaza","zemzem","zuzu","sasa","zuzu","ozam","zaza","hayda"))
我想知道每种产品中最常见的成分组合;哪种成分与哪种其他成分有关?我应用了我在这个线程here 中找到的代码:
combinaisons_par_PRODUCT = data %>%
full_join(data, by="PRODUCT") %>%
group_by(INGREDIENT.x, INGREDIENT.y) %>%
summarise(n = length(unique(PRODUCT))) %>%
filter(INGREDIENT.x!=INGREDIENT.y) %>%
mutate(item = paste(INGREDIENT.x, INGREDIENT.y, sep=", "))
它可以工作,但还有一个最后的缺陷;我希望订单被忽略。例如,这段代码会给我 1 个 HAHA 和 SASA 的关联,以及 1 个 SASA 和 HAHA 的关联。但对我来说,这些都是一样的。所以我希望代码忽略成分的顺序,并给我一个 2 HAHA 和 SASA 的唯一关联。
我尝试在应用代码之前对成分进行排序,但它也不起作用。有人可以帮我吗?我怎样才能有这些组合而不考虑顺序?
非常感谢!
【问题讨论】:
标签: r associations combinations