【问题标题】:How to extract a substring from a string using regex如何使用正则表达式从字符串中提取子字符串
【发布时间】:2021-09-25 03:29:42
【问题描述】:

我有一个像下面这样的字符串 如果可能的话,我想使用正则表达式或任何其他方式从这个字符串中提取突出显示的部分

密尔沃基/沙利文国家气象局已发布\n\n* Tornado Warning for...\nNorthwestern Columbia County in south central Wisconsin...\nSouthwestern Marquette County in south central Wisconsin...\n\n* 直到 CDT 晚上 945 点。\n\n* CDT 晚上 911 点,一场能够产生龙卷风的强雷暴\n位于 8 英里Wisconsin Dells 以东,以 45\nmph 的速度向东北移动。\n\n危险...龙卷风。\n\n来源...雷达显示旋转。\n\n影响...飞行的碎片对于没有避难所的人会很危险\n .移动房屋将被损坏或毁坏。\n屋顶、窗户和车辆都会受到损坏。树木\n可能受损。\n\n* 受影响的地点包括...\nPackwaukee、Endeavour 和 Briggsville。

description = 'The National Weather Service in Milwaukee/Sullivan has issued a\n\n* Tornado Warning for...\nNorthwestern Columbia County in south central Wisconsin...\nSouthwestern Marquette County in south central Wisconsin...\n\n* Until 945 PM CDT.\n\n* At 911 PM CDT, a severe thunderstorm capable of producing a tornado\nwas located 8 miles east of Wisconsin Dells, moving northeast at 45\nmph.\n\nHAZARD...Tornado.\n\nSOURCE...Radar indicated rotation.\n\nIMPACT...Flying debris will be dangerous to those caught without\nshelter. Mobile homes will be damaged or destroyed.\nDamage to roofs, windows, and vehicles will occur.  Tree\ndamage is likely.\n\n* Locations impacted include...\nPackwaukee, Endeavor and Briggsville.'

#now I want to match substring between (Tornado Warning for... *** ...\n\n*)

# I tried to like this

re.search('Tornado Warning for...(.*)\n\n*', description)

# I am getting results like this

<re.Match object; span=(67, 90), match='Tornado Warning for...\n'>

#expected result 

<re.Match object; span=(any, any), match='Tornado Warning for...\nNorthwestern Columbia County in south central Wisconsin...\nSouthwestern Marquette County in south central Wisconsin...\n\n*'>

它不匹配完​​整的子字符串它唯一匹配Tornado Warning for...\n

我想匹配 Tornado Warning for...\nNorthwestern Columbia County in south central Wisconsin...\nSouthwestern Marquette County in south central Wisconsin...\n\n*

子字符串从Tornado Warning for...开始到\n\n*结束

感谢您的帮助,对不起我的英语不好

【问题讨论】:

    标签: python python-3.x regex string substring


    【解决方案1】:

    . 无法匹配 \n。使用[\W\w] 代替.

    import re
    description = 'The National Weather Service in Milwaukee/Sullivan has issued a\n\n* Tornado Warning for...\nNorthwestern Columbia County in south central Wisconsin...\nSouthwestern Marquette County in south central Wisconsin...\n\n* Until 945 PM CDT.\n\n* At 911 PM CDT, a severe thunderstorm capable of producing a tornado\nwas located 8 miles east of Wisconsin Dells, moving northeast at 45\nmph.\n\nHAZARD...Tornado.\n\nSOURCE...Radar indicated rotation.\n\nIMPACT...Flying debris will be dangerous to those caught without\nshelter. Mobile homes will be damaged or destroyed.\nDamage to roofs, windows, and vehicles will occur.  Tree\ndamage is likely.\n\n* Locations impacted include...\nPackwaukee, Endeavor and Briggsville.'
    
    print(re.search(r'Tornado Warning for\.\.\.([\W\w]*?)\n\n\*', description).group())
    
    """
    Tornado Warning for...
    Northwestern Columbia County in south central Wisconsin...
    Southwestern Marquette County in south central Wisconsin...
    
    *
    """
    
    

    【讨论】:

      【解决方案2】:

      你可以匹配

      \bTornado Warning for\.\.\.(?:\n.*)*?\n\n
      

      模式匹配:

      • \bTornado Warning for\.\.\. 匹配 Tornado Warning for 前面的单词边界并转义点以匹配它们
      • (?:\n.*)*? 尽可能少地匹配换行符和该行的其余部分
      • \n\n 匹配 2 个换行符

      Regex demo | Python demo

      例如

      import re
      
      description = 'The National Weather Service in Milwaukee/Sullivan has issued a\n\n* Tornado Warning for...\nNorthwestern Columbia County in south central Wisconsin...\nSouthwestern Marquette County in south central Wisconsin...\n\n* Until 945 PM CDT.\n\n* At 911 PM CDT, a severe thunderstorm capable of producing a tornado\nwas located 8 miles east of Wisconsin Dells, moving northeast at 45\nmph.\n\nHAZARD...Tornado.\n\nSOURCE...Radar indicated rotation.\n\nIMPACT...Flying debris will be dangerous to those caught without\nshelter. Mobile homes will be damaged or destroyed.\nDamage to roofs, windows, and vehicles will occur.  Tree\ndamage is likely.\n\n* Locations impacted include...\nPackwaukee, Endeavor and Briggsville.'
      
      m = re.search(r'\bTornado Warning for\.\.\.(?:\n.*)*?\n\n', description)
      if m:
          print(m.group())
      

      输出

      Tornado Warning for...
      Northwestern Columbia County in south central Wisconsin...
      Southwestern Marquette County in south central Wisconsin...
      

      【讨论】:

        【解决方案3】:

        正则表达式可能如下所示:

        matched_string = re.findall("Tornado[a-zA-Z\s\.\\\*]+\\n\\n\*", description)
        print(matched_string)
        

        【讨论】:

          猜你喜欢
          • 2010-10-14
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2014-10-17
          • 2014-08-25
          • 1970-01-01
          • 2023-02-09
          相关资源
          最近更新 更多