【问题标题】:best way to parse a line in python to a dictionary将python中的一行解析为字典的最佳方法
【发布时间】:2009-10-29 15:04:29
【问题描述】:

我有一个文件,其中包含类似的行

account = "TEST1" Qty=100 price = 20.11 subject="some value" values="3=this, 4=that"

没有特殊的分隔符,每个键都有一个值,如果它是字符串,则用双引号括起来,但如果它是数字,则不是。尽管可能存在表示为“”的空白字符串,但没有没有值的键,并且引号没有转义字符,因为它不需要

我想知道用python解析这种行并将值作为键值对存储在字典中的好方法

【问题讨论】:

    标签: python parsing delimiter


    【解决方案1】:

    为此我们需要一个正则表达式。

    import re, decimal
    r= re.compile('([^ =]+) *= *("[^"]*"|[^ ]*)')
    
    d= {}
    for k, v in r.findall(line):
        if v[:1]=='"':
            d[k]= v[1:-1]
        else:
            d[k]= decimal.Decimal(v)
    
    >>> d
    {'account': 'TEST1', 'subject': 'some value', 'values': '3=this, 4=that', 'price': Decimal('20.11'), 'Qty': Decimal('100.0')}
    

    如果您愿意,可以使用浮点数而不是十进制数,但如果涉及金钱,这可能是个坏主意。

    【讨论】:

      【解决方案2】:

      pyparsing 译文可能更容易理解:

      from pyparsing import *
      
      # define basic elements - use re's for numerics, faster than easier than 
      # composing from pyparsing objects
      integer = Regex(r'[+-]?\d+')
      real = Regex(r'[+-]?\d+\.\d*')
      ident = Word(alphanums)
      value = real | integer | quotedString.setParseAction(removeQuotes)
      
      # define a key-value pair, and a configline as one or more of these
      # wrap configline in a Dict so that results are accessible by given keys
      kvpair = Group(ident + Suppress('=') + value)
      configline = Dict(OneOrMore(kvpair))
      
      src = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" ' \
              'values="3=this, 4=that"'
      
      configitems = configline.parseString(src)
      

      现在您可以使用返回的配置项 ParseResults 对象访问您的作品:

      >>> print configitems.asList()
      [['account', 'TEST1'], ['Qty', '100'], ['price', '20.11'], 
       ['subject', 'some value'], ['values', '3=this, 4=that']]
      
      >>> print configitems.asDict()
      {'account': 'TEST1', 'Qty': '100', 'values': '3=this, 4=that', 
        'price': '20.11', 'subject': 'some value'}
      
      >>> print configitems.dump()
      [['account', 'TEST1'], ['Qty', '100'], ['price', '20.11'], 
       ['subject', 'some value'], ['values', '3=this, 4=that']]
      - Qty: 100
      - account: TEST1
      - price: 20.11
      - subject: some value
      - values: 3=this, 4=that
      
      >>> print configitems.keys()
      ['account', 'subject', 'values', 'price', 'Qty']
      
      >>> print configitems.subject
      some value
      

      【讨论】:

        【解决方案3】:

        bobince 解析值的递归变体,其中嵌入了 equals 作为字典:

        >>> import re
        >>> import pprint
        >>>
        >>> def parse_line(line):
        ...     d = {}
        ...     a = re.compile(r'\s*(\w+)\s*=\s*("[^"]*"|[^ ,]*),?')
        ...     float_re = re.compile(r'^\d.+$')
        ...     int_re = re.compile(r'^\d+$')
        ...     for k,v in a.findall(line):
        ...             if int_re.match(k):
        ...                     k = int(k)
        ...             if v[-1] == '"':
        ...                     v = v[1:-1]
        ...             if '=' in v:
        ...                     d[k] = parse_line(v)
        ...             elif int_re.match(v):
        ...                     d[k] = int(v)
        ...             elif float_re.match(v):
        ...                     d[k] = float(v)
        ...             else:
        ...                     d[k] = v
        ...     return d
        ...
        >>> line = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" values=
        "3=this, 4=that"'
        >>> pprint.pprint(parse_line(line))
        {'Qty': 100,
         'account': 'TEST1',
         'price': 20.109999999999999,
         'subject': 'some value',
         'values': {3: 'this', 4: 'that'}}
        

        【讨论】:

          【解决方案4】:

          如果您不想使用正则表达式,另一种选择是一次读取字符串一个字符:

          string = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" values="3=this, 4=that"'
          
          inside_quotes = False
          key = None
          value = ""
          dict = {}
          
          for c in string:
              if c == '"':
                  inside_quotes = not inside_quotes
              elif c == '=' and not inside_quotes:
                  key = value
                  value = ''
              elif c == ' ':
                  if inside_quotes:
                      value += ' ';
                  elif key and value:
                      dict[key] = value
                      key = None
                      value = ''
              else:
                  value += c
          
          dict[key] = value
          print dict
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 2011-05-03
            • 1970-01-01
            • 2010-10-02
            • 2011-09-15
            • 2018-02-11
            • 1970-01-01
            • 2017-04-16
            • 1970-01-01
            相关资源
            最近更新 更多