在 R 中查找和替换文本答案

【问题标题】：Finding and replacing text in R在 R 中查找和替换文本
【发布时间】：2018-10-04 07:23:21
【问题描述】：

最近，我开始学习 R 并尝试通过自动化流程来探索更多内容。下面是示例数据，我正在尝试通过查找和替换标签中的特定文本来创建一个新列 (colname:Designations)。

因为，我正在使用大量新数据完成这项工作，我希望使用 R 编程而不是使用 excel 公式来自动化。

数据集：

strings<-c("Zonal Manager","Department Manager","Network Manager","Head of Sales","Account Manager","Alliance Manager","Additional Manager","Senior Vice President","General manager","Senior Analyst", "Solution Architect","AGM")

我使用的R代码：

t<-data.frame(strings,stringsAsFactors = FALSE)
colnames(t)[1]<-"Designations"
y<-sub(".*Manager*","Manager",strings,ignore.case = TRUE)

挑战：
在此所有数据都更改为经理，但我需要用主题替换其他名称。

我尝试了 ifelse 语句、grep、grepl、str、sub 等，但我没有得到我想要的东西

我不能使用第一个/第二个/最后一个词（作为'delimit'），因为主题分散在来回。例如：首席信息官或商业财务经理或年度股东大会

Excel 工作：
我已经将 300 个主题编码为...

经理（适用于所有总经理、助理经理、销售经理等）建筑师（Solution Arch、Sr. Arch 等）总监（高级总监、总监、副总监等）高级分析师分析师 Head（销售主管）

我在寻找什么：我需要创建一个新列，并将文本替换为相关的主题，就像我在 Excel 中使用 R 所做的那样。

如果我可以使用我已经在 excel 中编码的主要主题来匹配使用 R 编程的主题（如 excel 中的 vlookup），我可以。

预期结果： enter image description here 提前感谢您的帮助！

是的，与我所期待的完全一样。谢谢！！但是当我通过上传新数据集（excel文件）和

来尝试相同的方法时

df %>% 
   mutate(theme=gsub(".*(Manager|Lead|Director|Head|Administrator|Executive|Executive|VP|President|Consultant|CFO|CTO|CEO|CMO|CDO|CIO|COO|Cheif Executive Officer|Chief Technological Officer|Chief Digital Officer|Chief Financial Officer|Chief Marketing Officer|Chief Digital Officer|Chief Information Officer,Chief Operations Officer)).*","\\1",Designations,ignore.case = TRUE))

它没有工作。我应该在其他地方更正吗？

【问题讨论】：

预期结果是什么？
df %>% mutate(Designation_new= str_extract(Designations, str_c(strings, collapse = "|"))) ，这肯定会对你有所帮助。 .如果您可以使用 dput() 为我们提供可重现的示例，我们可以为您提供帮助
我已附上图片（预期输出）供您参考。

标签： r if-statement str-replace grepl

【解决方案1】：

数据：

strings<-c("Zonal Manager","Department Manager","Network Manager","Head of Sales","Account Manager",
           "Alliance Manager","Additional Manager","Senior Vice President","General manager","Senior Analyst", "Solution Architect","AGM")

你需要准备一个好的查找表：（你完成它并使其完美。）

lu_table <- data.frame(new = c("Manager", "Architect","Director"), old = c("Manager|GM","Architect|Arch","Director"), stringsAsFactors = F)

那么你就可以让mapply做这个工作了：

mapply(function(new,old) {ans <- strings; ans[grepl(old,ans)]<-new; strings <<- ans; return(NULL)}, new = lu_table$new, old = lu_table$old)

现在看看strings：

> strings
 [1] "Manager"               "Manager"               "Manager"               "Head of Sales"         "Manager"               "Manager"              
 [7] "Manager"               "Senior Vice President" "General manager"       "Senior Analyst"        "Architect"             "Manager"

请注意：

此解决方案使用<<-。所以这可能不是最好的解决方案。但在这种情况下有效。

【讨论】：

嗨安德烈，谢谢！这行得通。但我考虑了第二种解决方案，因为我有更多数据需要在数据集中替换。感谢您的时间和努力。再次感谢。
使用第二种解决方案，您将无法将“Solution Arch”映射到“Architect”。祝你好运:-)。

【解决方案2】：

你的意思是这样的吗？

library(dplyr)
strings <-
  c(
    "Zonal Manager",
    "Department Manager",
    "Network Manager",
    "Head of Sales",
    "Account Manager",
    "Alliance Manager",
    "Additional Manager",
    "Senior Vice President",
    "General manager",
    "Senior Analyst",
    "Solution Architect",
    "AGM"
  )

df = data.frame(Designations = strings)


df %>%
  mutate(
    theme = gsub(
      ".*(manager|head|analyst|architect|agm|director|president).*",
      "\\1",
      Designations,
      ignore.case = TRUE
    )
  )
#>             Designations     theme
#> 1          Zonal Manager   Manager
#> 2     Department Manager   Manager
#> 3        Network Manager   Manager
#> 4          Head of Sales      Head
#> 5        Account Manager   Manager
#> 6       Alliance Manager   Manager
#> 7     Additional Manager   Manager
#> 8  Senior Vice President President
#> 9        General manager   manager
#> 10        Senior Analyst   Analyst
#> 11    Solution Architect Architect
#> 12                   AGM       AGM

^{由reprex package (v0.2.1) 于 2018 年 10 月 4 日创建}

【讨论】：

如果有 Senior Analyst 而不仅仅是 Analyst，OP 需要获取 Senior Analyst。
是的，和我期待的完全一样。谢谢！！但是当我尝试处理大量数据时，我收到了相同的输出
是的，和我期待的完全一样。谢谢！！但是当我通过上传新数据集（excel文件）并使用 df%>%mutate(theme=gsub(".*(Manager|Lead|Director|Head|Administrator|Executive|Executive|VP|President|顾问|CFO|CTO|CEO|CMO|CDO|CIO|COO|首席执行官|首席技术官|首席数字官|首席财务官|首席营销官|首席数字官|首席信息官,首席运营官))。 *","\\1",Designations,ignore.case = TRUE)) 它不起作用。我应该在其他地方更正吗？
hi@Anonymous，你说的“它没有用”是什么意思？运行代码时发生了什么？它应该类似于df %>% mutate( theme = gsub( ".*(Manager|Lead|Director|Head|Administrator|Executive|Executive|VP|President|Consultant|CFO|CTO|CEO|CMO|CDO|CIO|COO|Cheif Executive Officer|Chief Technological Officer|Chief Digital Officer|Chief Financial Officer|Chief Marketing Officer|Chief Digital Officer|Chief Information Officer|Chief Operations Officer).*", "\\1", Designations, ignore.case = TRUE ))
您好，TC Zhang，谢谢。我得到了我所期望的输出。感谢您的帮助。