【发布时间】:2021-02-24 16:37:15
【问题描述】:
如何按常见模式对变量的元素进行分组。例如,我有一个数据库,其中有一个名为 company role 的字段,我希望能够将常见的角色归为一个。
employee <- c("a", "b", "c", "d", "e")
Rol <- c(" accounting assistant", "accou assist", "account.assistant",
"healt aux", "auxiliary in healt")
DF <- data.frame(employee, Rol)
我想把它变成这样的东西
| Employeee | ROL |
|---|---|
| A | accounting assistant |
| B | accounting assistant |
| C | accounting assistant |
| D | Healt auxiliary |
| E | Healt auxiliary |
目前我正在手动识别模式,但随着数据的增长,任务变得更加复杂,我将不胜感激。谢谢!
【问题讨论】:
-
对于
Healt auxiliary,是否有键/值对 -
试试
cbind(DF, cl=cutree(hclust(as.dist(adist(tolower(DF$Rol)))), h=16))。
标签: r data-cleaning