【发布时间】:2015-02-23 09:29:18
【问题描述】:
我正在尝试拆分这句话
"Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot " \
"for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this " \
"isn't true... Well, with a probability of .9 it isn't."
进入下面的列表。
Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it.
Did he mind?
Adam Jones Jr. thinks he didn't.
In any case, this isn't true...
Well, with a probability of .9 it isn't.
代码:
print re.findall('([A-Z]+[^.].*?[a-z.][.?!] )[^a-z]',text)
输出:
['Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid
a lot for it. ', "Adam Jones Jr. thinks he didn't. "]
K gud,但它错过了一些,有没有办法告诉 Python,因为 last [^a-z] 不是我的组的一部分,请从那里继续搜索。
编辑:
这是通过@sputnick 提到的前瞻性正则表达式实现的。
print re.findall('([A-Z]+[^.].*?[a-z.][.?!] )(?=[^a-z])',text)
输出:
['Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid
a lot for it. ', 'Did he mind? ', "Adam Jones Jr. thinks he didn't. "
, "In any case, this isn't true... "]
但我们仍然需要最后一句话。有什么想法吗?
【问题讨论】: