【问题标题】:Detect partial string matches in R检测 R 中的部分字符串匹配
【发布时间】:2014-04-09 07:54:23
【问题描述】:

我正在尝试计算以 AK 开头并且在交易中还包含 AK 但不以 AK 结尾的交易数量em>。

例子:

排除: 示例:AK->se(中间没有 AK)

EXCLUDE:AK->gg->se->ll :交易中不包含AK

包括: 例子: AK->se->Ak->gg

样本数据:

f<- data.frame(
id=c("A","A","A","A","C","C","D","D","E"),
Mode=c("AK->se","se->AK->gg, bishan->K","AK->se","se->gr->gg, bishan->AK","AK->se","se->gr->gg, bishan->AK","AK->se","se->gr->gg, bishan->AK","se->AK->df, hg->pp->sk")
)

我需要处理大量数据,因此优化至关重要。

提前致谢。

已编辑

f<- data.frame(
id=c("A","A","A","A","C","C","D","D","E"),
Mode=c("AK->se","se->AK->gg, bishan->K","AK->se","se->gr->gg, bishan->AK","AK->se","AK->AK->gg, bishan->AK","AK->se->Ak->gg","se->gr->gg, bishan->AK","AK->AK->df, hg->pp->sk")
)

【问题讨论】:

  • 您可能想重新表述您的问题。 “我正在尝试计算交易数量。我希望交易以 AK 开头,并且在交易中包含 AK,但它不以 AK 结尾。”不是很清楚
  • 为了清楚起见,为 OP 改写了它。
  • 我不明白第二个元素“se->AK->gg, bishan->K”。您需要更好地解释交易的格式
  • 您是否有权访问“未连接”数据,即类似这样的内容(对于 ID A):df &lt;- data.frame(id = rep("A", 7), grp = c(rep(1, 2), rep(2, 3), rep(3, 2)), time = c(1, 2, 1, 2, 3, 1, 2), state = c("AK", "se", "se", "AK", "gg", "bishan", "K"))?如果是这样,您可以使用其他(我认为更方便)技术对交易进行分类。只是一个想法。
  • 我不明白。什么是id、grp、时间?我们需要更简单的数据集或更多解释。什么是交易

标签: r regex optimization string-matching


【解决方案1】:

使用正则表达式

f<- data.frame(
  id=c("A","A","A","A","C","C","D","D","E"),
  Mode=c("AK->se","se->AK->gg, bishan->K","AK->se","se->gr->gg, bishan->AK","AK->se","se->gr->gg, bishan->AK","AK->se->AK->gg","se->gr->gg, bishan->AK","se->AK->df, hg->pp->sk")
)

selection = grepl(pattern="^AK->.*AK->",x=f$Mode,perl=TRUE)
f$Mode[selection]
f$id[selection]

使用 lapply (如果有很多字符串可能会慢一点)

f<- data.frame(
  id=c("A","A","A","A","C","C","D","D","E"),
  Mode=c("AK->se","se->AK->gg, bishan->K","AK->se","se->gr->gg, bishan->AK","AK->se","se->gr->gg, bishan->AK","AK->se->AK->gg","se->gr->gg, bishan->AK","se->AK->df, hg->pp->sk")
)

selection = sapply(strsplit(x=f$Mode,split="->"),FUN=function(x) (x[1]=="AK")&(x[length(x)]!="AK")&(sum(x=="AK")>1))
f$Mode[selection]
f$id[selection]

【讨论】:

  • lapply仍然选择以AK结尾的交易。您介意尝试使用已编辑部分中的数据吗?
  • @Carol 我认为您帖子编辑部分的数据中缺少 "。
  • 谢谢我已经解决了!但是,请允许我再问一个问题:如果我想要相反的模式,回归正则表达式会如何?
  • 在此页面的右侧,您应该有一个名为“相关”的部分。检查第一项!
  • 很抱歉,但我不太明白你的意思:(
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2014-07-20
  • 1970-01-01
  • 2014-08-07
  • 2014-05-21
  • 1970-01-01
  • 2016-10-18
  • 1970-01-01
相关资源
最近更新 更多