使用正则表达式从python中的字符串中提取复杂的子字符串答案

【问题标题】：Extracting a complex substring using regex with data from a string in python使用正则表达式从python中的字符串中提取复杂的子字符串
【发布时间】：2021-03-30 17:54:24
【问题描述】：

我有一个字符串说

text = 'i have on 31-Dec-08 USD 5234765 which I gave it in the donation"

我试过了：

pattern = r"^[\d]{2}.*,[\d]{3}$"
data = re.findall(pattern, text)

for s in data:
    print(s)

我想要的输出：

[2008 年 12 月 31 日，美元，5234765]

【问题讨论】：

嗨，请帮助我们帮助您，请详细说明您要提取的“复杂子字符串”是什么。
要提取所有你需要做的匹配data = re.findall(pattern, text).groups()。您究竟想在正则表达式中捕获什么？
这里的规则是什么？另外，你的意思是你需要一个包含三个值的列表吗？好吧，暂时你可以简单地使用\d.*\d。一些更“复杂”的模式是\b[A-Z]{3}\b|\b\d{1,2}-[a-zA-Z]{3}-\d{2}\b|\b\d+\b (demo)
@MustafaAydın 我想从给定的字符串中提取完全相同的输出。我整天都在尝试写模式。但这对我来说非常困难，因为我是初学者
@WiktorStribiżew 我想打印与给定字符串分隔的相同输出逗号。我尝试使用正则表达式，但无法成功。

标签： python regex string

【解决方案1】：

你可以这样做

import re

regex = r"(\w+-\w+-\w+)|([A-Z]{3})|(\d+)"

test_str = "i have on 31-Dec-08 USD 5234765 which I gave it in the donation"


matches = re.findall(regex, test_str)
temp = [_ for tupl in matches for _ in tupl if _]

print(temp) #['31-Dec-08', 'USD', '5234765']

\w 匹配任何单词字符（相当于[a-zA-Z0-9_]）
+ 匹配前一个令牌一次到无限次，尽可能多次，根据需要回馈（贪婪）
-匹配字符 - 字面意思（区分大小写）
[A-Z]{3} 匹配大写字母 3 次。
\d 匹配一个数字（相当于[0-9]）

【讨论】：