【问题标题】:Converting dictionary into a list of dictionaries将字典转换为字典列表
【发布时间】:2021-12-12 19:56:38
【问题描述】:

所以,我的任务是将字符串转换为字典(必须使用正则表达式)。我做了一个 findall 来分隔每个元素,但不知道如何将它们放在一起。

我有以下代码:

import re

def edata():
  with open("employeedata.txt", "r") as file:
    employeedata = file.read()
    IP_field = re.findall(r"\d+[.]\d+[.]\d+[.]\d+", employeedata)
    username_field = re.findall (r"[a-z]+\d+|- -", employeedata)
    date_field = re.findall (r"\d+\/[A-Z][a-z][0-9]+\/\d\d\d\d:\d+:\d+:\d+ -\d+", employeedata)
    type_field = re.findall (r'"(.*)?"', employeedata)
    Fields = ["IP","username","date","type"]
    Fields2 = IP_field, username_field, date_field, type_field
    dictionary = dict(zip(Fields,Fields2))
    return dictionary

print(edata())

当前输出:

{ "IP": ["190.912.120.151", "190.912.120.151"], "username": ["skynet10001", "skynet10001"] etc }

预期输出:

[{ "IP": "190.912.120.151", "username": "skynet10001" etc },
{ "IP": "190.912.120.151", "username": "skynet10001" etc }]

【问题讨论】:

  • 您能否分享您的输入样本和预期输出?
  • 这是一行数据:190.912.120.151 - skynet10001 [29/Jan/2012] "Temp" 和所需的输出将是 { "IP": "190.912.120.151", "username": “天网10001”等}
  • 对于你给出的那一行数据,你希望你的输出字典是什么样子的?
  • 就像我写的那样。 { "IP": "190.912.120.151", "用户名": "skynet10001"} 等
  • 糟糕,没有刷新页面。没有看到更新的评论。对于给定的输入,这是我得到的输出。这不是你所期待的吗?{'IP': ['190.912.120.151'], 'username': ['skynet10001'], 'date': [], 'type': ['Temp']}

标签: python regex


【解决方案1】:

另一个使用您已经构建的字典的解决方案。此代码使用列表解析和 zip 函数从现有的 dictionary 变量生成字典列表。

import re

def edata():
  with open("employeedata.txt", "r") as file:
    employeedata = file.read()
    IP_field = re.findall(r"\d+[.]\d+[.]\d+[.]\d+", employeedata)
    username_field = re.findall (r"[a-z]+\d+|- -", employeedata)

    date_field = re.findall (r"\[(.*?)\]", employeedata) ## changed your regex for the date field

    type_field = re.findall (r'"(.*)?"', employeedata)
    Fields = ["IP","username","date","type"]
    Fields2 = IP_field, username_field, date_field, type_field
    dictionary = dict(zip(Fields,Fields2))

    result_dictionary = [dict(zip(dictionary, i)) for i in zip(*dictionary.values())] ## convert to list of dictionaries
    return result_dictionary


print(edata())

【讨论】:

    【解决方案2】:

    你可以使用

    import re
    
    rx = re.compile(r'^(?P<IP>\d+(?:\.\d+){3})\s+\S+\s+(?P<Username>[a-z]+\d+)\s+\[(?P<Date>[^][]+)]\s+"(?P<Type>[^"]*)"')
    
    def edata():
        results = []
        with open("downloads/employeedata.txt", "r") as file:
            for line in file:
                match = rx.search(line)
                if match:
                    results.append(match.groupdict())
        return results
        
    print(edata())
    

    请参阅online Python demo。对于file = ['190.912.120.151 - skynet10001 [19/Jan/2012] "Temp"', '221.143.119.260 - terminator002 [16/Feb/2021] "Temp 2"'] 输入,输出将是:

    [{'IP': '190.912.120.151', 'Username': 'skynet10001', 'Date': '19/Jan/2012', 'Type': 'Temp'}, {'IP': '221.143.119.260', 'Username': 'terminator002', 'Date': '16/Feb/2021', 'Type': 'Temp 2'}]
    

    正则表达式是

    ^(?P<IP>\d+(?:\.\d+){3})\s+\S+\s+(?P<Username>[a-z]+\d+)\s+\[(?P<Date>[^][]+)]\s+"(?P<Type>[^"]*)"
    

    请参阅regex demo详情

    • ^ - 字符串开头
    • (?P&lt;IP&gt;\d+(?:\.\d+){3}) - 组“IP”:一位或多位数字,然后出现三个 . 和一位或多位数字
    • \s+\S+\s+ - 一个或多个非空白字符,两端用一个或多个空白字符括起来
    • (?P&lt;Username&gt;[a-z]+\d+) - 组“用户名”:一个或多个小写 ASCII 字母,然后是一个或多个数字
    • \s+ - 一个或多个空格
    • \[ - 一个 [ 字符
    • (?P&lt;Date&gt;[^][]+) - 组“日期”:除[] 之外的一个或多个字符
    • ]\s+" - 一个 ] 字符,一个或多个空格,"
    • (?P&lt;Type&gt;[^"]*) - 组“类型”:" 以外的零个或多个字符
    • " - " 字符。

    【讨论】:

    • 最佳解决方案!