【问题标题】:R's equivalent of string.replace() in pythonR 相当于 python 中的 string.replace()
【发布时间】:2019-07-14 08:30:36
【问题描述】:

我需要替换字符向量的某些值:

x <- data.frame(Strings = c("one", "two","three","four","five","four","five","four","five","two","thre","two","three","two","three"), stringsAsFactors = FALSE)
> x
   Strings
1      one
2      two
3    three
4     four
5     five
6     four
7     five
8     four
9     five
10     two
11   three
12     two
13   three
14     two
15   three

在 python 中,我会这样做:

x["Strings"].replace(["one", "two", "thre","three"], ["One","Two","Three","Three"], inplace=True)

但在 r 中,函数 replace() 不能以同样的简单方式工作。 Stackoverflow 中有很多字符串替换的解决方案,但没有一个能这么简单。这在 r 中可能吗?

【问题讨论】:

    标签: r string data-manipulation


    【解决方案1】:

    如果您只想将每个单词的首字母大写,我们可以使用sub

    x$new <- sub('^([a-z])', '\\U\\1', x$Strings, perl = TRUE)
    

    输出:

       Strings   new
    1      one   One
    2      two   Two
    3    three Three
    4     four  Four
    5     five  Five
    6     four  Four
    7     five  Five
    8     four  Four
    9     five  Five
    10     two   Two
    11    thre  Thre
    12     two   Two
    13   three Three
    14     two   Two
    15   three Three
    

    如果已经有一个新旧单词列表可供替换,我们可以使用str_replace_all,它的风格与发布的python示例OP相似:

    library(stringr)
    
    pattern <- c("one", "two", "thre", "three")
    replacements <- c("One", "Two", "Three", "Three")
    
    named_vec <- setNames(replacements, paste0("\\b", pattern, "\\b"))
    
    x$new <- str_replace_all(x$Strings, named_vec)
    

    matchhashmap

    library(dplyr)
    
    x$new <- coalesce(replacements[match(x$Strings, pattern)], x$new)
    
    
    library(hashmap)
    
    hash_lookup = hashmap(pattern, replacements)
    x$new <- coalesce(hash_lookup[[x$Strings]], x$new)
    

    输出:

       Strings   new
    1      one   One
    2      two   Two
    3    three Three
    4     four  four
    5     five  five
    6     four  four
    7     five  five
    8     four  four
    9     five  five
    10     two   Two
    11    thre Three
    12     two   Two
    13   three Three
    14     two   Two
    15   three Three
    

    【讨论】:

      【解决方案2】:

      语法接近您的 Python 代码的解决方案(使用 plyr 包):

      x$Strings <- plyr::mapvalues(x$Strings, 
                      c("one", "two", "thre","three"),
                      c("One","Two","Three","Three")
      )
      

      【讨论】:

        【解决方案3】:

        一种方法是将它们转换为因子,然后替换级别

        > x <- data.frame(Strings = c("one", "two","three","four","five","four","five","four","five","two","thre","two","three","two","three"), stringsAsFactors = FALSE)
        > x$Strings <- as.factor(x$Strings)
        > levels(x$Strings) <- c("Five", "Four", "One", "Three", "Three", "Two")
        > x
           Strings
        1      One
        2      Two
        3    Three
        4     Four
        5     Five
        6     Four
        7     Five
        8     Four
        9     Five
        10     Two
        11   Three
        12     Two
        13   Three
        14     Two
        15   Three
        

        【讨论】:

        • 我很惊讶,这并没有得到更多的支持。任何仅使用基础 R 的简单解决方案本质上都比涉及额外包的解决方案要好!
        【解决方案4】:

        如果您想要大写,则带有capitalize() 的包 Hmisc 将起作用。如果我误解了这个问题,我深表歉意。

        library(Hmisc)
        
        x <- data.frame(Strings = c("one", "two","three","four","five","four","five","four","five","two","thre","two","three","two","three"), stringsAsFactors = FALSE)
        
        x<-sub("thre[^[:space:]]*", "Three", x$Strings)
        
        xCap<-capitalize(x)
        
        as.data.frame(xCap)
            xCap
        1    One
        2    Two
        3  Three
        4   Four
        5   Five
        6   Four
        7   Five
        8   Four
        9   Five
        10   Two
        11 Three
        12   Two
        13 Three
        14   Two
        15 Three
        

        感谢 cmets 中的 @RuiBarradas 进行子修复。

        【讨论】:

        • 这并没有像 OP 想要的那样纠正错字 "Thre"
        • @RuiBarradas 我看到了,没有仔细阅读数据。我将关闭此答案,感谢您的提醒。
        • 不要关闭它,编辑它。在您的代码之后,sub("Thre[^[:space:]]*", "Three", x$Strings).
        • 我需要大写,还需要替换一些特定的值(例如示例中的“Thre”)。所以还是谢谢你的回答!
        【解决方案5】:

        这是一个使用recode 的选项。创建一个 key/val 对的列表,然后使用recode 将 'Strings' 中的值与list 的 'key' 匹配并替换为对应的值

        library(tidyverse)
        lst1 <- list(one = "One", two = "Two", three = "Three", four = "Four", five = "Five")
        x %>% 
           mutate(Strings  = recode(Strings, !!! lst1))
        

        注意:假设驼峰式是巧合

        【讨论】:

          【解决方案6】:
          x <- data.frame(Strings = c("one", "two","three","four","five","four","five","four","five","two","thre","two","three","two","three"), stringsAsFactors = FALSE)
          y=c("one", "two", "thre","three")
          z=c("One","Two","Three","Three")
          
          
          x$Strings=x%>%rowwise()%>%mutate(Strings=if_else(!is.na(z[match(Strings,y)]),
                                                            z[match(Strings,y)],false=Strings))
          

          使用dplyr(),您只需更改yz

          【讨论】:

            猜你喜欢
            • 2011-06-25
            • 2012-08-25
            • 2012-05-27
            • 2014-12-22
            • 2018-06-23
            • 1970-01-01
            • 2014-10-31
            • 2017-09-10
            相关资源
            最近更新 更多