【问题标题】:Python regex, conditional searchingPython 正则表达式,条件搜索
【发布时间】:2015-02-23 09:29:18
【问题描述】:

我正在尝试拆分这句话

"Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot " \
"for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this " \
"isn't true... Well, with a probability of .9 it isn't."

进入下面的列表。

Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it.
Did he mind?
Adam Jones Jr. thinks he didn't.
In any case, this isn't true...
Well, with a probability of .9 it isn't.

代码:

print re.findall('([A-Z]+[^.].*?[a-z.][.?!] )[^a-z]',text)

输出:

['Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid
 a lot for it. ', "Adam Jones Jr. thinks he didn't. "]

K gud,但它错过了一些,有没有办法告诉 Python,因为 last [^a-z] 不是我的组的一部分,请从那里继续搜索。

编辑:

这是通过@sputnick 提到的前瞻性正则表达式实现的。

print re.findall('([A-Z]+[^.].*?[a-z.][.?!] )(?=[^a-z])',text)

输出:

['Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid
 a lot for it. ', 'Did he mind? ', "Adam Jones Jr. thinks he didn't. "
, "In any case, this isn't true... "]

但我们仍然需要最后一句话。有什么想法吗?

【问题讨论】:

标签: python regex


【解决方案1】:
(.+?)(?<=(?<![A-Z][a-z])(?<![a-z]\.[a-z])(?:\.|\?)(?=\s|$))

试试这个。查看演示。获取捕获或组。使用re.findall

https://regex101.com/r/gQ3kS4/45

【讨论】:

    【解决方案2】:

    终于

     print re.findall('[A-Z]+[^.].*?[a-z.][.?!] (?=[^a-z])|.*.$',text)
    

    以上功能可根据需要完美运行。包括最后一句话。但我不知道为什么|.*.$ 工作请帮助我理解。

    输出:

    ['Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid
     a lot for it. ', 'Did he mind? ', "Adam Jones Jr. thinks he didn't. "
    , "In any case, this isn't true... ", "Well, with a probability of .9 
    it isn't."] 
    

    【讨论】:

    • 末尾没有空格:re.findall('[A-Z]+[^.].*?[a-z.][.?!](?: (?=[^a-z])|$)', text)
    【解决方案3】:

    试试这个:

    print re.findall('([A-Z]+[^.].*?[a-z.][.?!] )(?=[^a-z])',text)
    

    使用正向预测正则表达式技术,检查http://www.regular-expressions.info/lookaround.html

    【讨论】:

    • 哇,正则表达式很棒,完美。谢谢@sputnick。 ?= 究竟是什么意思?
    • 这是 positive look-ahead 的语法,请查看我的答案中添加的链接
    • 链接上的好教程,是否有办法也包括最后一句话说排除照顾空格的点和 [^a-z] 它的文件结尾。类似于单词边界
    猜你喜欢
    • 1970-01-01
    • 2015-12-01
    • 2013-08-16
    • 1970-01-01
    • 2018-01-17
    • 2011-12-27
    • 2015-05-30
    • 2014-01-23
    • 1970-01-01
    相关资源
    最近更新 更多