【问题标题】:R get strings between specific patternR获取特定模式之间的字符串
【发布时间】:2021-05-02 16:09:20
【问题描述】:

我在 R 中有一个长字符串,其中包含一系列具有这种模式的值:

list <- '{s:K_01.01, y:01}="whatever" and {s:K_02.01, y:02}="whatever" and {s:K_03.01, y:03}="whatever" and {s:K_01.01, y:01}="whatever2" and {s:K_01.01, y:01}="whatever3"'

我想在数据框中提取并存储一列,其中包含以 {s:K_01.01 开头并以 " 结尾的所有字符串,并且忽略所有其他字符串。

预期输出:

{s:K_01.01, y:01}="whatever"
{s:K_01.01, y:01}="whatever2"
{s:K_01.01, y:01}="whatever3"

有人知道怎么做吗?

【问题讨论】:

    标签: r regex regex-lookarounds


    【解决方案1】:

    您可以使用模式提取所有匹配项

    {s:K_01\.01[^{}]*}="[^"]+"
    
    • {s:K_01\.01 匹配以{s:K_01.01 开头的字符串
    • [^{}]*}= 匹配除{} 之外的任何字符并匹配}=
    • "[^"]+" 匹配从 ""

    See a regex demo | R demo

    library(stringr)
    
    list <- '{s:K_01.01, y:01}="whatever" and {s:K_02.01, y:02}="whatever" and {s:K_03.01, y:03}="whatever" and {s:K_01.01, y:01}="whatever2" and {s:K_01.01, y:01}="whatever3"'
    str_extract_all(list, "\\{s:K_01\\.01[^\\{}]*\\}=\"[^\"]+\"")
    

    输出

    [[1]]
    [1] "{s:K_01.01, y:01}=\"whatever\""  "{s:K_01.01, y:01}=\"whatever2\""
    [3] "{s:K_01.01, y:01}=\"whatever3\""
    

    【讨论】:

      【解决方案2】:

      基本 R 方法:

      list <- '{s:K_01.01, y:01}="whatever" and {s:K_02.01, y:02}="whatever" and {s:K_03.01, y:03}="whatever" and {s:K_01.01, y:01}="whatever2" and {s:K_01.01, y:01}="whatever3"'
      regmatches(list, gregexpr("\\{s:K_01\\.01.*?\\}=\".*?\"", list))[[1]]
      
      [1] "{s:K_01.01, y:01}=\"whatever\""  "{s:K_01.01, y:01}=\"whatever2\""
      [3] "{s:K_01.01, y:01}=\"whatever3\""
      

      【讨论】:

      • 谢谢!但这抓住了一切。虽然我只对以 {s:K_01.01 开头的那些感兴趣(我需要排除以 {s:K_02.01{s 开头的那些:以 K_03.01 为例)
      • 重新加载您的页面。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-01-09
      • 2014-01-31
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多