R 相当于 python 中的 string.replace()答案

【问题标题】：R's equivalent of string.replace() in pythonR 相当于 python 中的 string.replace()
【发布时间】：2019-07-14 08:30:36
【问题描述】：

我需要替换字符向量的某些值：

x <- data.frame(Strings = c("one", "two","three","four","five","four","five","four","five","two","thre","two","three","two","three"), stringsAsFactors = FALSE)
> x
   Strings
1      one
2      two
3    three
4     four
5     five
6     four
7     five
8     four
9     five
10     two
11   three
12     two
13   three
14     two
15   three

在 python 中，我会这样做：

x["Strings"].replace(["one", "two", "thre","three"], ["One","Two","Three","Three"], inplace=True)

但在 r 中，函数 replace() 不能以同样的简单方式工作。 Stackoverflow 中有很多字符串替换的解决方案，但没有一个能这么简单。这在 r 中可能吗？

【问题讨论】：

标签： r string data-manipulation

【解决方案1】：

如果您只想将每个单词的首字母大写，我们可以使用sub：

x$new <- sub('^([a-z])', '\\U\\1', x$Strings, perl = TRUE)

输出：

   Strings   new
1      one   One
2      two   Two
3    three Three
4     four  Four
5     five  Five
6     four  Four
7     five  Five
8     four  Four
9     five  Five
10     two   Two
11    thre  Thre
12     two   Two
13   three Three
14     two   Two
15   three Three

如果已经有一个新旧单词列表可供替换，我们可以使用str_replace_all，它的风格与发布的python示例OP相似：

library(stringr)

pattern <- c("one", "two", "thre", "three")
replacements <- c("One", "Two", "Three", "Three")

named_vec <- setNames(replacements, paste0("\\b", pattern, "\\b"))

x$new <- str_replace_all(x$Strings, named_vec)

或match 或hashmap：

library(dplyr)

x$new <- coalesce(replacements[match(x$Strings, pattern)], x$new)


library(hashmap)

hash_lookup = hashmap(pattern, replacements)
x$new <- coalesce(hash_lookup[[x$Strings]], x$new)

输出：

   Strings   new
1      one   One
2      two   Two
3    three Three
4     four  four
5     five  five
6     four  four
7     five  five
8     four  four
9     five  five
10     two   Two
11    thre Three
12     two   Two
13   three Three
14     two   Two
15   three Three

【讨论】：

【解决方案2】：

语法接近您的 Python 代码的解决方案（使用 plyr 包）：

x$Strings <- plyr::mapvalues(x$Strings, 
                c("one", "two", "thre","three"),
                c("One","Two","Three","Three")
)

【讨论】：

【解决方案3】：

一种方法是将它们转换为因子，然后替换级别

> x <- data.frame(Strings = c("one", "two","three","four","five","four","five","four","five","two","thre","two","three","two","three"), stringsAsFactors = FALSE)
> x$Strings <- as.factor(x$Strings)
> levels(x$Strings) <- c("Five", "Four", "One", "Three", "Three", "Two")
> x
   Strings
1      One
2      Two
3    Three
4     Four
5     Five
6     Four
7     Five
8     Four
9     Five
10     Two
11   Three
12     Two
13   Three
14     Two
15   Three

【讨论】：

我很惊讶，这并没有得到更多的支持。任何仅使用基础 R 的简单解决方案本质上都比涉及额外包的解决方案要好！

【解决方案4】：

如果您想要大写，则带有capitalize() 的包 Hmisc 将起作用。如果我误解了这个问题，我深表歉意。

library(Hmisc)

x <- data.frame(Strings = c("one", "two","three","four","five","four","five","four","five","two","thre","two","three","two","three"), stringsAsFactors = FALSE)

x<-sub("thre[^[:space:]]*", "Three", x$Strings)

xCap<-capitalize(x)

as.data.frame(xCap)
    xCap
1    One
2    Two
3  Three
4   Four
5   Five
6   Four
7   Five
8   Four
9   Five
10   Two
11 Three
12   Two
13 Three
14   Two
15 Three

感谢 cmets 中的 @RuiBarradas 进行子修复。

【讨论】：

这并没有像 OP 想要的那样纠正错字 "Thre"。
@RuiBarradas 我看到了，没有仔细阅读数据。我将关闭此答案，感谢您的提醒。
不要关闭它，编辑它。在您的代码之后，sub("Thre[^[:space:]]*", "Three", x$Strings).
我需要大写，还需要替换一些特定的值（例如示例中的“Thre”）。所以还是谢谢你的回答！

【解决方案5】：

这是一个使用recode 的选项。创建一个 key/val 对的列表，然后使用recode 将 'Strings' 中的值与list 的 'key' 匹配并替换为对应的值

library(tidyverse)
lst1 <- list(one = "One", two = "Two", three = "Three", four = "Four", five = "Five")
x %>% 
   mutate(Strings  = recode(Strings, !!! lst1))

注意：假设驼峰式是巧合

【讨论】：

【解决方案6】：

x <- data.frame(Strings = c("one", "two","three","four","five","four","five","four","five","two","thre","two","three","two","three"), stringsAsFactors = FALSE)
y=c("one", "two", "thre","three")
z=c("One","Two","Three","Three")


x$Strings=x%>%rowwise()%>%mutate(Strings=if_else(!is.na(z[match(Strings,y)]),
                                                  z[match(Strings,y)],false=Strings))

使用dplyr()，您只需更改y 和z。

【讨论】：