【问题标题】:Extraction of sub string from the text using R [duplicate]使用R从文本中提取子字符串[重复]
【发布时间】:2020-09-18 15:48:09
【问题描述】:

我有一个字符串数据如下:

a<-  "\n    Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n        Uploaded on May 3, 2020 at 10:56 in Research\n            View Forum\n        \n"

我必须为此提取字符串“社交媒体学习和行为”,我使用了以下代码:

gsub("        Uploaded on .* ", "", gsub("\n    Update Your Profile to Dissolve This Message\n", "",a)) 

这给了我如下输出

"Social Media Learning and behaviour\n\n"

我无法匹配确切的模式。在没有“\n\n”的情况下提取“社交媒体学习和行为”的确切模式是什么

【问题讨论】:

  • 您也可以匹配捕获组中的前一行,并匹配其后包含 Uploaded ^(.*)\r?\n Uploaded on regex101.com/r/bF5GKT/1 的行

标签: r regex gsub


【解决方案1】:

您可以捕获组中的上一行并匹配包含 Uploaded 的下一行:

(.*)\r?\n[^\S\r\n]+Uploaded on

Regex demo

a<-  "\n    Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n        Uploaded on May 3, 2020 at 10:56 in Research\n            View Forum\n        \n"
stringr::str_match(a, "(.*)\\r?\\n[^\\S\\r\\n]+Uploaded on")

【讨论】:

    【解决方案2】:

    您可以提取"Update Your Profile to Dissolve This Message""Uploaded on"之间的部分

    sub(".*Update Your Profile to Dissolve This Message\n(.*)\n\\s+Uploaded on.*", "\\1", a)
    #[1] "Social Media Learning and behaviour"
    

    您也可以从stringr 使用str_match

    stringr::str_match(a, "Update Your Profile to Dissolve This Message\n(.*)\n\\s+Uploaded on")[, 2]
    

    【讨论】:

      猜你喜欢
      • 2017-12-30
      • 1970-01-01
      • 1970-01-01
      • 2017-02-17
      • 1970-01-01
      • 2018-10-25
      • 1970-01-01
      • 1970-01-01
      • 2017-04-11
      相关资源
      最近更新 更多