【发布时间】:2021-10-14 06:10:01
【问题描述】:
我想使用一个正则表达式来匹配两个字符串之间的任何文本:
sample_string= "Message ID: SM9MatRNTnMAYaylR0QgOH///qUUveBCbw==
2021-07-10T20:48:23.997Z john s (X Y Bank) -
john.s@xy.com:
[EVENT] 347376954900491 (john.s@xy.com) created room
(roomName='CSTest' roomDescription='CS Test Chat Room' COPY_DISABLED=false
READ_ONLY=false DISCOVERABLE=false MEMBER_ADD_USER_ENABLED=false
roomType=PRIVATE conversationScope=internal owningCompany=X Y
Bank)
Message ID: nsabNaqeXfuEj9mBEhvS0n///qUUveAhbw==
2021-07-10T20:48:23.997Z john s (X Y Bank) -
john.s@xy.comsays
[EVENT] 347376954900491 (john.s@xy.com) invited 347376954900486
(kerren.n@xy.com) to room (CSTest|john s|16091907435583)
Message ID: Nu/EYTkTQ5qdbqzZ0Rig8n///qUUvQ42dA==
2021-07-10T20:48:23.997Z john s (X Y Bank) -
john.s@xy.comsays
Catchyou later
Message ID: dy2yaByqhm+n88Gd3VQOhH///qUUrz8odA==
2021-07-10T20:48:23.997Z kerren n (X Y Bank) -
nancy.n@xy.comsays
KeywordContent_ Cricket is a bat-and-ball game played between two teams of
eleven players on a field at the centre of which is a 20-metre (22-yard) pitch
with a wicket at each end, each comprising two bails balanced on three stumps.
The batting side scores runs by striking the ball bowled at the wicket with
the bat, while the bowling and fielding side tries to prevent this and dismiss
each player (so they are "out").
* * *
Generated by Content Export Service | Stream Type: SymphonyPost |
Stream ID: ZZo5pRRPFC18uzlonFjya3///qUUveBHdA== | Room Type: Private |
Conversation Scope: internal | Owning Company: X Y Bank | File
Generated Date: 2021-07-10T20:48:23.997Z | Content Start Date:
2021-07-10T20:48:23.997Z | Content Stop Date: 2021-07-10T20:48:23.997Z
* * *
*** (780787) Disclaimer:
(incorporated in paris with Ref. No. ZC18, is authorised by Prudential Regulation
Authority (PRA) and regulated by Financial Conduct Authority and PRA. oyp and
its affiliates (We) monitor this confidential message meant for your
information only. We make no recommendation or offer. You should get
independent advice. We accept no liability for loss caused hereby. See market
commentary disclaimers (
http://wholesalebanking.com/en/utility/Pages/d-mkt.aspx ),
Dodd-Frank and EMIR disclosures (
http://wholesalebanking.com/en/capabilities/financialmarkets/Pages/default.aspx
) "
在这个例子中,我想提取emailID 和关键字Messaage ID: 之后的所有内容
所以预期的输出是:
extracted_list =[': [EVENT] 347376954900491 (john.s@xy.com) created room (roomName='CSTest' roomDescription='CS Test Chat Room' COPY_DISABLED=false READ_ONLY=false DISCOVERABLE=false MEMBER_ADD_USER_ENABLED=false roomType=PRIVATE conversationScope=internal owningCompany=X Y Bank)','says [EVENT] 347376954900491 (john.s@xy.com) invited 347376954900486 (kerren.n@xy.com) to room (CSTest|john s|16091907435583)','says Catchyou later','says KeywordContent_ Cricket is a bat-and-ball game played between two teams of eleven players on a field at the centre of which is a 20-metre (22-yard) pitch with a wicket at each end, each comprising two bails balanced on three stumps. The batting side scores runs by striking the ball bowled at the wicket with the bat, while the bowling and fielding side tries to prevent this and dismiss each player (so they are "out").']
注意:最后***后的所有内容都不是文本的一部分
到目前为止我尝试的是:
text = re.findall(r'\S+@\S+\s+(.*)Message ID', sample_string)
print (text)
##output: []
【问题讨论】:
-
所以,基本上你的问题是:如何提取文本(字符串)的一部分,从
emailID到Messaage ID?总是尽量提供一个最小的例子,而不是一大堆文字。 -
@MarkusWeninger 是的,抱歉我刚开始使用这个平台。
-
那里应该有
emailID吗? -
@Jesper emailID 紧跟在
(X Y Bank) -之后 -
我想你的意思是
[^\s@]+@[^\s@]+\s(.*?)\bMessage ID\bregex101.com/r/zd5w8v/1 但是你必须添加re.DOTALL作为re.findall的最后一个参数
标签: python-3.x regex