【发布时间】:2021-11-25 08:03:10
【问题描述】:
我会用一个例子来说明我的问题。
样本数据:
df <- data.frame(ID = 1:5, Description = c("'foo' is a dog", "'bar' is a dog", "'foo' is a cat", "'foo' is not a cat", "'bar' is a fish"), Category = c("A", "A", "B", "B", "C"))
> df
ID Description Category
1 1 'foo' is a dog A
2 2 'bar' is a dog A
3 3 'foo' is a cat B
4 4 'foo' is not a cat B
5 5 'bar' is a fish C
我想要做的是折叠相同类别的相似描述/ID,预期输出:
ID Description Category
1 3 B ‘foo’ is a cat
2 1,2 A ‘foo,bar’ is a dog
3 5 C ‘bar’ is a fish
4 4 B ‘foo’ is not a cat
我想开始使用 dplyr,但我无法完全了解如何实现这一点,谁能帮助我?
df %>%
group_by(Category) %>%
## some condition to check if content outside of single quote are the same.
## If so, collapse them into one row, otherwise, leave as it is.
## The regex to get the content outside of single quote
`gsub("^'(.*?)'.*", "\\2", x)`
## then collapse
summarise(new description = paste())
【问题讨论】:
-
我会以这个为起点:
df %>% group_by(Category) %>% summarize(Description = stringr::str_c(Description, collapse = ", "), ID = stringr::str_c(ID, collapse = ","))