获取两个字符串中的子字符串 [关闭]答案

【问题标题】：Get a substring within two strings [closed]获取两个字符串中的子字符串 [关闭]
【发布时间】：2016-08-03 08:59:48
【问题描述】：

我有一个非常大的字符串，其中包含来自某个系统的日志
我只想要以<status> 开头并以</status> 结尾的部分。
听说RegEx表达式是个好方法，但是我真的不知道怎么用。
有什么想法吗？

【问题讨论】：

你想对status中的文字做什么？
@sarcoma 我想在里面打印一条随机线供个人使用
您应该更新您的问题以反映这一点。
请看stackoverflow.com/questions/7911504/…

标签： python regex string python-2.7 truncate

【解决方案1】：

s = "Hello I am a very long string <status>I've got a lovely bunch of coconuts</status> here they are standing in a row"
excerpt = s.partition("<status>")[2].rpartition("</status>")[0]
print excerpt

结果：

I've got a lovely bunch of coconuts

【讨论】：

如果有多个带有<status>....</status> 的部分，这将无法正常工作，但这绝对是完成此任务的最有效方法（假设整个文本已加载到内存中）

【解决方案2】：

如果你想尝试正则表达式，这里有一个方法：

import re

regex = re.compile(r"\<status\>(.*?)\</status\>", re.IGNORECASE)
s = """This is some long random text <status>This is the first status block</status> 
and some more text <status>and another block</status> 
and yet more <status>This is the last status block</status>"""
print(re.findall(regex, s))

产量

['This is the first status block', 'and another block', 'This is the last status block']

Demo

此方法的主要优点是它提取所有 <status>...</status> 一行中的块，而不仅仅是第一个块。请注意，对于三引号字符串，<status> 和 </status> 需要位于同一行。

【讨论】：

【解决方案3】：

如果<status>和</status>只出现一次，则可以使用string_name[string_name.index("<status>") + 8: string_name.index("</status>"]。

s = "test<status>test2</status>"
print s[s.index("<status>") + 8: s.index("</status>"]

输出：

test2

【讨论】：