【问题标题】:Create a dictionary from colon separated key value string从冒号分隔的键值字符串创建字典
【发布时间】:2022-01-07 17:23:47
【问题描述】:

尝试从给定的字符串创建字典,可以是格式

key1:value1 key2:value2

但是有时选择价值是个问题

  1. 空格key1: value1
  2. 引用key1: "value has space"

密钥的标识符是something:

在下面尝试过

def tokenize(msg):
    legit_args = [i for i in msg if ":" in i]
    print(legit_args)
    dline = dict(item.split(":") for item in legit_args)
    return dline

以上仅适用于无空格值。

然后在下面尝试

def tokenize2(msg):
    try:
        #return {k: v for k, v in re.findall(r'(?=\S|^)(.+?): (\S+)', msg)}
        return dict(token.split(':') for token in shlex.split(msg))
    except:
        return {}

这适用于key:"something given like this",但仍需要一些更改才能工作,以下是问题

>>> msg = 'key1: "this is value1 "   key2:this is value2 key3: this is value3'
>>> import shlex
>>> dict(token.split(':') for token in shlex.split(msg))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: dictionary update sequence element #1 has length 1; 2 is required
>>> shlex.split(msg)  # problem is here i think
['key1:', 'this is value1 ', 'key2:this', 'is', 'value2', 'key3:', 'this', 'is', 'value3']

【问题讨论】:

  • 什么是完整字符串示例?
  • 第二种(正则表达式)方法到底有什么问题?示例输入和输出会有所帮助
  • 题外话,但a bare except is bad practice。相反,请使用您期望的特定异常,例如 except ValueError,或至少 except Exception
  • 添加了更多细节@KeoniGarner,我认为这很好,因为有空格的值应该只用引号括起来
  • 值是否可以在引号内包含冒号?

标签: python regex


【解决方案1】:

请您尝试以下方法:

import re

s = "key1: \"this is value1 \"   key2:this is value2 key3: this is value3"
d = {}
for m in re.findall(r'\w+:\s*(?:\w+(?:\s+\w+)*(?=\s|$)|"[^"]+")', s):
    key, val = re.split(r':\s*', m)
    d[key] = val.strip('"')
print(d)

输出:

{'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 '}

正则表达式的解释:

  • \w+:\s* 匹配一个单词后跟一个冒号并且可能 (零个或多个)空格。
  • (?: ... ) 组成非捕获组。
  • :\w+(?:\s+\w+)*(?=\s|$) 匹配一个或多个单词,后跟 空格或字符串结尾。
  • 竖线字符| 交替正则表达式模式。
  • "[^"]+" 匹配用双引号括起来的字符串。

[编辑]
如果您想处理fancy quotes(又名弯引号智能引号),请尝试:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re

s = "key1: \"this is value1 \"   key2:this is value2 key3: this is value3 title: “incorrect title” title2: “incorrect title2” key4:10.20.30.40"
d = {}
for m in re.findall(r'\w+:\s*(?:[\w.]+(?:\s+[\w.]+)*(?=\s|$)|"[^"]+"|“.+?”)', s):
    key, val = re.split(r':\s*', m)
    d[key] = val.replace('“', '"').replace('”', '"').strip('"')
print(d)

输出:

{'title': 'incorrect title', 'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 ', 'key4': '10.20.30.40', 'title2': 'incorrect title2'}

[编辑2]
以下代码现在允许在值中使用冒号:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re

s = "key1: \"this is value1 \"   key2:this is value2 key3: this is value3 title: “incorrect title” title2: “incorrect title2” key4:10.20.30.40 key5:\"value having:colon\""
d = {}
for m in re.findall(r'\w+:\s*(?:[\w.]+(?:\s+[\w.]+)*(?=\s|$)|"[^"]+"|“.+?”)', s):
    key, val = re.split(r':\s*', m, 1)
    d[key] = val.replace('“', '"').replace('”', '"').strip('"')
print(d)

输出:

{'title': 'incorrect title', 'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 ', 'key5': 'value having:colon', 'key4': '10.20.30.40', 'title2': 'incorrect title2'}

修改应用在以下行:

key, val = re.split(r':\s*', m, 1)

将第三个参数1 添加为maxsplit 以限制拆分的最大计数。

【讨论】:

  • 当引号类似于标题时,这不会按预期工作:“不正确的标题”
  • 自然而然。我已经更新了我的答案以支持“花哨的报价”。
  • key:10.20.30.40 ,不标记但 key:"10.20.30.40" 工作正常,值可以包含 .
  • 好的,我已经更新了我的答案,将\w 修改为[\w.] 以匹配一个点。
  • 值有时也可以有冒号,比如 key:"value having:colon",我们可以处理这个问题吗
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-01-03
  • 1970-01-01
  • 1970-01-01
  • 2019-02-11
  • 1970-01-01
相关资源
最近更新 更多