【问题标题】:R how to replace substrings within a longer string with a new substringR如何用新子字符串替换较长字符串中的子字符串
【发布时间】:2018-04-01 19:44:17
【问题描述】:

我有一个长向量。每个元素都是一个字符串。 每个字符串都可以拆分为以','分隔的子字符串。

我想检查向量中的每个字符串是否至少包含一个“坏”字符串。如果是这样,则应将包含该“坏”字符串的整个 SUBstring 替换为新字符串。我用循环写了一个很长的函数。但我可以发誓一定有一种更简单的方法——也许用 stringr? 非常感谢您的建议!

# Create an example data frame:
test <- data.frame(a = c("str1_element_1_aaa, str1_element_2",
                         "str2_element_1",
                         "str3_element_1, str3_element_2_aaa, str3_element_3"),
                   stringsAsFactors = F)
test
str(test)

# Defining my long function that checks if each string in a
# vector contains a substring with a "bad" string in it.
# If it does, that whole substring is replaced with a new string:
library(stringr)
mystring_replace = function(strings_vector, badstring, newstring){
  with_string <- grepl(badstring, strings_vector)  # what elements contain badstring?
  mysplits <- str_split(string = test$a[with_string], pattern = ', ') # split those elements with badstring based on ', '
  for (i in 1:length(mysplits)) {   # loop through the list of splits:
    allstrings <- mysplits[[i]]
    for (ii in 1:length(allstrings)) {  # loop through substrings
      if (grepl(badstring, allstrings[ii])) mysplits[[i]][ii] <- newstring
    }
  }
  for (i in seq_along(mysplits)) {  # merge the split elements back together
    mysplits[[i]] <- paste(mysplits[[i]], collapse = ", ")
  }
  strings_vector[with_string] <- unlist(mysplits)
  return(strings_vector)
}
# Test
mystring_replace(test$a, badstring = '_aaa', newstring = "NEW")

【问题讨论】:

  • 代替使用 3 个 for 循环,您可以拆分坏字符串并加入好字符串。
  • 好主意,但这对我没有帮助。我不想加入一个好的字符串。我想用新的子字符串替换包含坏字符串的整个子字符串。

标签: r string stringr


【解决方案1】:

你觉得这样可以吗?

new_str_replace <- function(strings_vector, badstring, newstring){
  split.dat <- strsplit(strings_vector,', ')[[1]]
  split.dat[grepl(badstring, split.dat)] <- newstring
  return(paste(split.dat, collapse = ', '))
}

results <- unname(sapply(test$a, new_str_replace, badstring = '_aaa', newstring = 'NEW'))
results
#[1] "NEW, str1_element_2"                 "str2_element_1"                     
#[3] "str3_element_1, NEW, str3_element_3"

【讨论】:

    【解决方案2】:

    我以分而治之的方式做到了。首先我写了一个只对一个字符串进行操作的函数,然后将它向量化。

    # does the operation for a string only. divide-and-conquer
    replace_one = function(string, badstring, newstring) {
      # split it at ", "
      strs = str_split(string, ", ")[[1]]
      # an ifelse to find the ones containing badstring and replacing them
      strs = ifelse(grepl(badstring, strs, fixed = TRUE), newstring, strs)
      # join them again
      paste0(strs, collapse = ", ")
    }
    
    # vectorizes it
    my_replace = Vectorize(replace_one, "string", USE.NAMES = FALSE)
    

    【讨论】:

      【解决方案3】:

      这是一种使用tidyversepurrrstringr 的方法:

      library(tidyverse)
      library(stringr)
      
      # Small utility function
      find_and_replace <- function(string, bad_string, replacement_string) {
        ifelse(str_detect(string, bad_string), replacement_string, string)
      }
      
      str_split(test$a, ", ") %>%                 
        map(find_and_replace, "aaa", "NEW") %>%   
        map_chr(paste, collapse = ", ") %>%
        unlist
      

      基本上:将向量拆分为一个列表,将find_and_replace 映射到该列表上,然后折叠结果。我建议在每个管道 %&gt;% 之后单独查看结果。

      【讨论】:

      • 我喜欢它!美丽的!谢谢!
      • 奇怪,我把它放在一个函数中,但它不能正常工作:
      • # 小实用函数 find_and_replace % map(find_and_replace, mybad_string, myreplacement) %>% map_chr(paste, collapse = ", ") %>% unlist out }
      • @user3245256 道歉——您还需要通过library(stringr) 加载stringr 包。更新了代码——一旦这些 cmets 在您的工作结束后,删除这些 cmets 可能是安全的。
      猜你喜欢
      • 2017-03-23
      • 1970-01-01
      • 1970-01-01
      • 2012-04-03
      • 2013-07-23
      • 1970-01-01
      相关资源
      最近更新 更多