【问题标题】:Use regex to get info from a specific text format使用正则表达式从特定文本格式获取信息
【发布时间】:2016-04-10 10:04:27
【问题描述】:

我有一个包含这样内容的文本:

(some text)
libncursesw5-dev:amd64 depends on libc6-dev | libc-dev;(some text)
libx32ncursesw5 depends on libc6-x32 (>= 2.16);(some text)
libx32ncurses5-dev depends on libncurses5-dev (= 5.9+20150516-2ubuntu1);(some text)
libx32ncursesw5-dev depends on libc6-dev-x32;(some text)
lib32tinfo-dev depends on lib32c-dev;(some text)

以下是其中一个句子的完整示例:

dpkg: error processing package lib32tinfo5 (--install):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of libncurses5-dev:amd64:
 libncurses5-dev:amd64 depends on libc6-dev | libc-dev; however:
    Package libc6-dev is not installed.
    Package libc-dev is not installed.

整个文本被分成几个段落,例如上面的段落,每个段落包含其中一个句子。

我想要一个在 python 中使用 re 库的正则表达式,它可以使用 findall 选项给我类似的东西:

('libc6-dev', '', 'libc-dev', '')
('libc6-x32','2.16')
('libncurses5-dev','5.9+20150516-2ubuntu1')
('libc6-dev-x32','')
('lib32c-dev','')

换句话说,我希望得到您的帮助,以便从此类文本中获取包含包及其版本(如果指定)的元组。

我做了这个正则表达式:

(?<=depends on )([a-zA-Z0-9\-]*)(?: \([=> ]*([a-zA-Z0-9-+.]*)(?:\)))?|(?: \| )([a-zA-Z0-9\-]*)(?: \([=> ]*([a-zA-Z0-9-+.]*)(?:\)))?(?=;)

我得到了这个结果:

('libc6-dev', '', '', '')
('', '', 'libc-dev', '')
('libc6-x32', '2.16', '', '')
('libncurses5-dev', '5.9+20150516-2ubuntu1', '', '')
('libc6-dev-x32', '', '', '')
('lib32c-dev', '', '', '')

如你所见,对于句子:

libncursesw5-dev:amd64 depends on libc6-dev | libc-dev;

我得到了这个答案:

('libc6-dev', '', '', '')
('', '', 'libc-dev', '')

而不是这个:

('libc6-dev', '', 'libc-dev', '')

感谢您的帮助。

【问题讨论】:

    标签: python regex string python-3.x


    【解决方案1】:
    #!/usr/bin/python2
    # -*- coding: utf-8 -*-
    
    import re
    
    input = """(some text)
    libncursesw5-dev:amd64 depends on libc6-dev | libc-dev;(some text)
    libx32ncursesw5 depends on libc6-x32 (>= 2.16);(some text)
    libx32ncurses5-dev depends on libncurses5-dev (= 5.9+20150516-2ubuntu1);(some text)
    libx32ncursesw5-dev depends on libc6-dev-x32;(some text)
    lib32tinfo-dev depends on lib32c-dev;(some text)"""
    
    #a = []
    #m = re.findall("depends on ([^\s;]+)\ \|\ ([^\s;]+)", input) # 1
    #a = a + m
    #m = re.findall("depends on ([^\s;]+)\ \([><=]{,2} ([^;]+)\)", input) # 2, 3
    #a = a + m
    #m = re.findall("depends on ([^\s;]+)", input) # 4, 5
    #a = a + m
    
    m = re.findall("depends on ([^\s;]+)\ \|\ ([^\s;]+)|depends on ([^\s;]+)\ \([><=]{,2} ([^;]+)\)|depends on ([^\s;]+)", input)
    
    print m
    

    输出:

    [
        ('libc6-dev', 'libc-dev', '', '', ''),
        ('', '', 'libc6-x32', '2.16', ''),
        ('', '', 'libncurses5-dev', '5.9+20150516-2ubuntu1', ''),
        ('', '', '', '', 'libc6-dev-x32'),
        ('', '', '', '', 'lib32c-dev')
    ]
    

    你可以一个一个地得到它,也可以和 | 一起得到它。不知道能不能帮到你

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-03-31
      • 2016-09-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多