R：如何根据其他字符串标记位置在字符串中添加字符？答案

【问题标题】：R: How add characters in a string based on other string marker positions?R：如何根据其他字符串标记位置在字符串中添加字符？
【发布时间】：2018-12-05 17:48:57
【问题描述】：

我想根据来自另一个字符串的标记位置在一个字符串中添加标记。我有包含两列的 SOURCE 数据框：“ortho”和“syllabify”。我想使用“下划线”标记创建目标列。来自“ortho”的字符串应根据“sillabify”中的“underlines”位置用“underlines”分隔。

df

SOURCE:  
   ortho    syllabify       
agradeço  R_OOR_OR_OR  
    bala        OR_OR        
 futebol    OR_OR_ORC    

TARGET:  
   ortho    syllabify       TARGET
agradeço  R_OOR_OR_OR  a_gra_de_ço    
    bala        OR_OR        ba_la
 futebol    OR_OR_ORC    fu_te_bol

谢谢大家！

【问题讨论】：

到目前为止你有什么尝试？
我使用 gregexpr(x@syllabify) 来获取“下划线”标记的字符串位置。现在，如何使用这些位置在适当的位置插入“下划线”标记？

标签： r regex string split

【解决方案1】：

我不知道你在用什么语言（Gustavo，Melisso），但在 Java 中这就是答案：

初始化器：

String sillabify = "OR_OR_ORC";
String ortho = "futebol";
String answer = returnTheTARGETColumnStringUsingTheUnderlineMarkers(ortho, sillabify);

方法：

public String returnTheTARGETColumnStringUsingTheUnderlineMarkers(String pOrtho, String pSillabify) {
    String target = "";

    int ind = 0;
    while (pSillabify.contains("_")) {
        target = target + pOrtho.substring(0, pSillabify.indexOf("_")) + "_";
        pOrtho = pOrtho.substring(pSillabify.indexOf("_"), pOrtho.length());
        pSillabify = pSillabify.substring(pSillabify.indexOf("_") + 1, pSillabify.length());
    }

    target = target + pOrtho;

    return target;
}

返回“fu_te_bol”。

【讨论】：

Tks，这看起来很不错。不过，我正在使用 R Cran 程序。此功能应适用于 33M 字，因此，我正在寻找机器消耗较少的东西。寻找使用 R 函数和正则表达式优化的解决方案！

【解决方案2】：

这里有一个解决方案：

df <- read.table(text = "   ortho    syllabify       
agradeço  R_OOR_OR_OR  
    bala        OR_OR        
 futebol    OR_OR_ORC", header = TRUE)

library(purrr)
df <- within(df, {
  ortho <- as.character(ortho)

  underscore_loc <- gregexpr("_", syllabify)
  target <- map2(ortho, underscore_loc, function(string, loc) {
    locs <- cbind(c(1, loc) - pmax(0, 1 + 0:length(loc) - 2), c(loc, nchar(string)) - c(1:length(loc), 0))
    strings <- apply(locs, 1, function(x) substr(string, x[1], x[2]))
    paste(strings, collapse = "_")
  })
  rm(underscore_loc)
})
df
#>      ortho   syllabify      target
#> 1 agradeço R_OOR_OR_OR a_gra_de_ço
#> 2     bala       OR_OR       ba_la
#> 3  futebol   OR_OR_ORC   fu_te_bol

由reprex package (v0.2.0) 于 2018 年 6 月 26 日创建。

purrr 包用于获取 map2 函数 - 类似于 lapply，但可以跨 2 个列表进行输入。

【讨论】：