如何通过正则表达式拆分python列表答案

【问题标题】：how to split a python list by a regex expression如何通过正则表达式拆分python列表
【发布时间】：2018-07-15 23:12:56
【问题描述】：

我正在逐行读取网络上的文件，每一行都是一个列表。该列表有三列明显地按此模式分隔：+++$+++。

这是我的代码：

with closing(requests.get(url, stream=True)) as r:
    reader = csv.reader(codecs.iterdecode(r.iter_lines(), 'latin-1'))
    for i, row in enumerate(reader):
        if i < 5:
            t = row[0].split('(\s\+{3}\$\+{3}\s)+')
            print(t)

我曾尝试在 python3.6 中使用此指令拆分列表，但无法使其正常工作。任何建议都非常感谢：

名单：

['m0 +++$+++ 10 things i hate about you +++$+++ http://www.dailyscript.com/scripts/10Things.html']
['m1 +++$+++ 1492: conquest of paradise +++$+++ http://www.hundland.org/scripts/1492-ConquestOfParadise.txt']
['m2 +++$+++ 15 minutes +++$+++ http://www.dailyscript.com/scripts/15minutes.html']
['m3 +++$+++ 2001: a space odyssey +++$+++ http://www.scifiscripts.com/scripts/2001.txt']
['m4 +++$+++ 48 hrs. +++$+++ http://www.awesomefilm.com/script/48hours.txt']

这是我的正则表达式：

row[0].split('(\s\+{3}\$\+{3}\s)+')

每一行只有一个组件 -> row[0]

当我打印结果时没有拆分行。

【问题讨论】：

.split() 在字符串上根本不是正则表达式匹配 - 它实际上是在寻找字符串(\s\+{3}\$\+{3}\s)+！你想要re.split(r'(\s\+{3}\$\+{3}\s)+', row[0])。
或使用row[0].split(" +++$+++ ")，因为您在这里所做的一切似乎都无法受益于正则表达式的强大功能。
同时删除 re.split 中的括号以不返回 +++$+++
谢谢@jasonharper 的澄清。我现在学会了这个。

标签： python regex list split

【解决方案1】：

在做

row[0].split(' +++$+++ ')

应该在没有正则表达式的情况下为您提供您想要的。

【讨论】：

【解决方案2】：

假设您不想使用 split()，如果您想放松并返回一个元组，这可能会有所帮助。

输入

import re
input = '''['m0 +++$+++ 10 things i hate about you +++$+++ http://www.dailyscript.com/scripts/10Things.html']
['m1 +++$+++ 1492: conquest of paradise +++$+++ http://www.hundland.org/scripts/1492-ConquestOfParadise.txt']
['m2 +++$+++ 15 minutes +++$+++ http://www.dailyscript.com/scripts/15minutes.html']
['m3 +++$+++ 2001: a space odyssey +++$+++ http://www.scifiscripts.com/scripts/2001.txt']
['m4 +++$+++ 48 hrs. +++$+++ http://www.awesomefilm.com/script/48hours.txt']'''
output = re.findall('\[\'([\S\s]+?)[\s]+[\+]{3}\$[\+]{3}[\s]+([\S\s]+?)[\s][\+]{3}\$[\+]{3}[\s]+([\S\s]+?)\'\]', input)
print(output)

输出：

[('m0', '10 things i hate about you', 'http://www.dailyscript.com/scripts/10Things.html'), ('m1', '1492: conquest of paradise', 'http://www.hundland.org/scripts/1492-ConquestOfParadise.txt'), ('m2', '15 minutes', 'http://www.dailyscript.com/scripts/15minutes.html'), ('m3', '2001: a space odyssey', 'http://www.scifiscripts.com/scripts/2001.txt'), ('m4', '48 hrs.', 'http://www.awesomefilm.com/script/48hours.txt')]

我也在尝试使用交替正则表达式，但对于我的生活，我无法让公式起作用哈哈..最终。我稍后会发布，但希望以上内容对您有所帮助

【讨论】：

谢谢，@Inquisitor01 我从 jasonharper 那里得到了一份不错的。欣赏它。