【发布时间】:2014-05-27 05:07:04
【问题描述】:
嗨朋友们,我问了一个相关问题here。这里的问题是txt(keywords) 没有检测到标点符号。我试图使答案通用但失败了。
基本上我有一个带有标点符号和没有标点符号的txt(关键字),我需要在文件toSearch 中搜索。
例如,这些是我的文件toSearch的内容
[1]'Nokia. Okay. R: Samsung R: Samsung M: And you have? R: I have Micromax'
[2]'M: Okay, you have taken car. R: I have (Mahindra Scorpio and Mahindra's) this Duro DZ.M: Okay.'
[3]'M: What is your age ? R: 32 years R: My name is "Nitish". I have Interior designing business.'
[4]'R: 3rd, Not extra spicy. R: 4th, Fresh. R: 5th, Variety. R: 6th, Hygienic environment'
[5]'How you feel? How it should be? We will move forward, if there we have to make an ideal'
[6]'What is the strength of your organisation? How many people a re working.'
[7]'R: Read newspaper R:Had breakfast with family.'
和txt(关键字)是。我使用#@ 来分隔关键字,因为我不能使用,(逗号)。
txt<-"R: Samsung R: Samsung M:#@I have (Mahindra Scorpio and Mahindra's)#@R: 32 years R: My name is "Nitish"#@R: 4th, Fresh. R: 5th, Variety#@How you feel? How it should be?
我的预期 o/p 是在关键字中找到出现并用下划线 _ 替换空格
[1]'Nokia. Okay. R:_Samsung_R:_Samsung_M: And you have? R: I have Micromax'
[2]'M: Okay, you have taken car. R: I_have_(Mahindra_Scorpio_and_Mahindra's) this Duro DZ.M: Okay.'
[3]'M: What is your age ? R:_32_years_R:_My_name_is_"Nitish". I have Interior designing business.'
[4]'R: 3rd, Not extra spicy. R:_4th,_Fresh._R:_5th,_Variety. R: 6th, Hygienic environment'
[5]'How_you_feel?_How_it_should_ be? We will move forward, if there we have to make an ideal'
[6]'What is the strength of your organisation? How many people a re working.'
[7]'R: Read newspaper R:Had breakfast with family.'
如果你们不明白,这是简单的查找和替换文本(FART)功能。只有空格被替换为_
我尝试过使用这个正则表达式
for(i in 1:length(txt))
{
#finding the first word of the keyword
start <- head(strsplit(txt, split=" ")[[i]], 1)
n <- stri_stats_latex(txt[i])[4]
#all possible occurrences for the keywords in the text
o<-unlist(regmatches(toSearch,gregexpr(paste0(start,"(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,",n-1,"}"),toSearch,ignore.case=TRUE)))
#exact match with the result
p<-which(!is.na(pmatch(txt,o)))
#replace the keywords in the text file.
text<-as.character(replace_all(text,txt[p],str_replace_all(txt[p])))
}
【问题讨论】:
-
我看到只有空格被替换,但取决于什么?我无法理解你的解释
-
我认为没有人曾经缩写过“查找和替换文本”。永远。
-
@LegoStormtroopr - 之前可能有一次:fart-it.sourceforge.net
-
FART 一定会喜欢它
-
@aelor 基本上我需要在文件列表中找到关键字的出现。因此尝试在文件中搜索关键字(带标点符号)。如果存在则用@替换关键字空格987654337@ 每次出现都在文件中。这样我就可以找到后面部分的频率和索引。