在文件中查找并用标点符号替换文本答案

【问题标题】：Find and replace text with punctuation's in a file在文件中查找并用标点符号替换文本
【发布时间】：2014-05-27 05:07:04
【问题描述】：

嗨朋友们，我问了一个相关问题here。这里的问题是txt(keywords) 没有检测到标点符号。我试图使答案通用但失败了。

基本上我有一个带有标点符号和没有标点符号的txt（关键字），我需要在文件toSearch 中搜索。

例如，这些是我的文件toSearch的内容

 [1]'Nokia. Okay. R: Samsung R: Samsung M: And you have? R: I have Micromax'
 [2]'M: Okay, you have taken car. R: I have (Mahindra Scorpio and Mahindra's) this Duro DZ.M: Okay.'
 [3]'M: What is your age ? R: 32 years R: My name is "Nitish". I have Interior designing business.'
 [4]'R: 3rd, Not extra spicy. R: 4th, Fresh. R: 5th, Variety. R: 6th, Hygienic environment'
 [5]'How you feel? How it should be? We will move forward, if there we have to make an ideal'
 [6]'What is the strength of your organisation? How many people a re working.'
 [7]'R: Read newspaper R:Had breakfast with family.'

和txt（关键字）是。我使用#@ 来分隔关键字，因为我不能使用,（逗号）。

 txt<-"R: Samsung R: Samsung M:#@I have (Mahindra Scorpio and Mahindra's)#@R: 32 years R: My name is "Nitish"#@R: 4th, Fresh. R: 5th, Variety#@How you feel? How it should be?

我的预期 o/p 是在关键字中找到出现并用下划线 _ 替换空格

 [1]'Nokia. Okay. R:_Samsung_R:_Samsung_M: And you have? R: I have Micromax'
 [2]'M: Okay, you have taken car. R: I_have_(Mahindra_Scorpio_and_Mahindra's) this Duro DZ.M: Okay.'
 [3]'M: What is your age ? R:_32_years_R:_My_name_is_"Nitish". I have Interior designing business.'
 [4]'R: 3rd, Not extra spicy. R:_4th,_Fresh._R:_5th,_Variety. R: 6th, Hygienic environment'
 [5]'How_you_feel?_How_it_should_ be? We will move forward, if there we have to make an ideal'
 [6]'What is the strength of your organisation? How many people a re working.'
 [7]'R: Read newspaper R:Had breakfast with family.'

如果你们不明白，这是简单的查找和替换文本（FART）功能。只有空格被替换为_

我尝试过使用这个正则表达式

for(i in 1:length(txt))
{
    #finding the first word of the keyword 
    start <- head(strsplit(txt, split=" ")[[i]], 1)  
    n <- stri_stats_latex(txt[i])[4] 

    #all possible occurrences for the keywords in the text
    o<-unlist(regmatches(toSearch,gregexpr(paste0(start,"(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,",n-1,"}"),toSearch,ignore.case=TRUE)))  

    #exact match with the result
    p<-which(!is.na(pmatch(txt,o)))  

    #replace the keywords in the text file.
    text<-as.character(replace_all(text,txt[p],str_replace_all(txt[p]))) 
}

【问题讨论】：

我看到只有空格被替换，但取决于什么？我无法理解你的解释
我认为没有人曾经缩写过“查找和替换文本”。永远。
@LegoStormtroopr - 之前可能有一次：fart-it.sourceforge.net
FART 一定会喜欢它
@aelor 基本上我需要在文件列表中找到关键字的出现。因此尝试在文件中搜索关键字（带标点符号）。如果存在则用@替换关键字空格987654337@ 每次出现都在文件中。这样我就可以找到后面部分的频率和索引。

标签： r regex replace

【解决方案1】：

因此，在使用正则表达式时，您必须非常小心标点符号。如果您要进行完全匹配，最好不要使用正则表达式并将fixed=T 设置为grep。因此，您可以使用 Reduce

进行查找和替换

#input data
target<-c("Nokia. Okay. R: Samsung R: Samsung M: And you have? R: I have Micromax", 
"M: Okay, you have taken car. R: I have (Mahindra Scorpio and Mahindra's) this Duro DZ.M: Okay.", 
"M: What is your age ? R: 32 years R: My name is \"Nitish\". I have Interior designing business.", 
"R: 3rd, Not extra spicy. R: 4th, Fresh. R: 5th, Variety. R: 6th, Hygienic environment", 
"How you feel? How it should be? We will move forward, if there we have to make an ideal", 
"What is the strength of your organisation? How many people a re working.", 
"R: Read newspaper R:Had breakfast with family.")

kw<-c("R: Samsung R: Samsung M:", "I have (Mahindra Scorpio and Mahindra's)", 
"R: 32 years R: My name is \"Nitish\"", "R: 4th, Fresh. R: 5th, Variety", 
"How you feel? How it should be?")

这里我们使用reduce来依次替换目标文本中的每个关键字

Reduce(function (t,kw) gsub(kw, gsub(" ","_",kw), t, fixed=T), 
    kw, init=target, accumulate=F)

# [1] "Nokia. Okay. R:_Samsung_R:_Samsung_M: And you have? R: I have Micromax"                         
# [2] "M: Okay, you have taken car. R: I_have_(Mahindra_Scorpio_and_Mahindra's) this Duro DZ.M: Okay." 
# [3] "M: What is your age ? R:_32_years_R:_My_name_is_\"Nitish\". I have Interior designing business."
# [4] "R: 3rd, Not extra spicy. R:_4th,_Fresh._R:_5th,_Variety. R: 6th, Hygienic environment"          
# [5] "How_you_feel?_How_it_should_be? We will move forward, if there we have to make an ideal"        
# [6] "What is the strength of your organisation? How many people a re working."                       
# [7] "R: Read newspaper R:Had breakfast with family."

我希望这对你放屁有帮助。

【讨论】：

【解决方案2】：

一个适用于更大问题的简化示例。

toSearch <- c("this is some text","something else to search")
txt <- c("is some#@else to")
txt <- strsplit(txt,"#@")[[1]]
txtundsc <- gsub("\\s+","_",txt)

for(i in seq_along(txt)) { toSearch <- gsub(txt[i],txtundsc[i],toSearch) }
toSearch
# [1] "this is_some text"        "something else_to search"

【讨论】：

不能比这更简单..完美无缺