【问题标题】:CSV from list of dictionaries with differing length and keys来自具有不同长度和键的字典列表的 CSV
【发布时间】:2020-05-25 19:15:36
【问题描述】:

我有一个要写入 csv 文件的字典列表。 第一个字典的长度不同,并且具有与以下字典不同的键。

dict_list = [{"A": 1, "B": 2}, {"C": 3, "D": 4, "E": 5}, {"C": 6, "D": 7, "E": 8}, ...]

如何将其写入 csv 文件,以使文件如下所示:

A B C D E
1 2 3 4 5
    6 7 8
    . . .

【问题讨论】:

  • 你真的希望 1,2,3,4,5 在一行中,而它们来自 2 个不同的字典吗?

标签: python python-3.x csv dictionary data-conversion


【解决方案1】:

您也可以仅使用 Python 语言附带的内置功能。我下面的示例类似于@Serge Ballesta 提出的示例。代码如下:

import csv

# sample data
data = [{'A': 1, 'B': 2}, {'A': 3, 'D': 4, 'E': 5}, {'C': 6, 'D': 7, 'E': 8}]
# Collect from elements in **data** (they are dict object) the field names and store
# them in a **set** to preserve their uniqueness
fields = set()
for item in data:
    names = set(item.keys())
    fields = fields | names   # we used the **or** i.e | operator for **set**

fields = list(fields)   # cast the fields into a list
# and sort the content so that during the display everything is in order :)
fields.sort()

# Now let write a function that return a cleaned data from the original, that is all
# data items have the same field names.

def clean_data(origdata, fieldnames):
    """Turn the original data into a new data with similar field in data items.

    Parameters
    ----------
    origdata: list of dict
         original data which will be cleaned or harmonized according to the field names
    fieldnames: list of strings
         fields names in the new data items

    Returns
    -------
    Returns a new data consisting of list of dict where all dict items have the same
    keys (i.e fieldnames)
    """
    newdata = []
    for dataitem in data:
        keys = dataitem.keys()
        for key in fieldnames:
             if key not in keys:
                  # In this instance we update the datitem with **key** and value= ' '
                  dataitem[key] = ' '
        newdata.append(dataitem)

    return newdata


def main():
    """Test the above function and display the result"""
    newdata = clean_data(data, fields)

    # write the data to a csv file
    with open("data.csv", "w", newline='') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fields)
        writer.writeheader()
        for row in newdata:
            writer.writerow(row)

    # Now let load our newly written csv file and print the content
    # -- some fancy display formatting here: not needed but I like it. :)
    nfields = len(fields)
    fmt = " %s " * nfields
    headInfo = fmt % tuple(fields)
    line = '-'* (len(headInfo)+1)
    print(line)
    print("|" + headInfo)
    print(line)
    with open("data.csv", "r", newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        for item im reader:
            row = [item[field] for field in fields]
            printf("|" + fmt % tuple(row))

    print(line)



main()

上面的脚本将产生以下输出:

---------------------
| A | B | C | D | E |
---------------------
| 1 | 2 |   |   |   |
|   |   | 3 | 4 | 5 |
|   |   | 6 | 7 | 8 |
--------------------- 

【讨论】:

    【解决方案2】:

    问题是您需要完整的列集才能在文件开头写入标题。但除此之外,csv.DictWriter 是您所需要的:

    # optional: compute the fieldnames:
    fieldnames = set()
    for d in dict_list:
        fieldnames.update(d.keys())
    fieldnames = sorted(fieldnames)    # sort the fieldnames...
    
    # produce the csv file
    with open("file.csv", "w", newline='') as fd:
        wr = csv.DictWriter(fd, fieldnames)
        wr.writeheader()
        wr.writerows(dict_list)
    

    生成的 csv 将如下所示:

    A,B,C,D,E
    1,2,,,
    ,,3,4,5
    ,,6,7,8
    

    如果您真的想将行与不相交的键集组合在一起,您可以这样做:

    # produce the csv file
    with open("file.csv", "w", newline='') as fd:
        wr = csv.DictWriter(fd, sorted(fieldnames))
        old = { k: k for k in wr.fieldnames }     # use old for the header line
        for row in dict_list:
            if len(set(old.keys()).intersection(row.keys())) != 0:
                wr.writerow(old)                  # common fields: write old and start a new row
                old = row
            old.update(row)                       # disjoint fields: just combine
        wr.writerow(old)                          # do not forget last row
    

    你会得到:

    A,B,C,D,E
    1,2,3,4,5
    ,,6,7,8
    

    【讨论】:

    • 这基本上就是我一直在寻找的东西,我很欣赏它的简单性。但是,列的顺序被弄乱了。 “A”和“B”列出现在“C”、“D”和“E”之间。在 fieldnames.update() 期间到底发生了什么?
    • @MaxJ.:您可以对字段名称进行排序(请参阅我的答案第一部分中的编辑)。我还展示了如何将行与不相交的键集组合起来。
    • 我通过将第二个字典的键添加到第一个字典的键来设置字段名。您的第一个解决方案可能比我要求的(第二部分)更适合我的问题。由于这对最低要求有效,因此我将其作为公认的解决方案。
    【解决方案3】:

    如果您在列表上调用pd.DataFrame(),Pandas 能够从字典列表中生成数据框。在生成的数据框中,每个字典都是一行,每个键对应一列。因此,第 7 个字典中第 3 个键对应的值(我称之为 key3)将位于 key3 列的第 7 行。

    这对您的问题意味着什么:您首先必须修改您的 dict_list 以包含合并的字典,如下所示:

    dict_list.insert(2, dict(**dict_list[0], **dict_list[1]))
    print(dict_list)
    
    [{'A': 1, 'B': 2},
     {'C': 3, 'D': 4, 'E': 5},
     {'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5},
     {'C': 6, 'D': 7, 'E': 8}]
    

    这会将索引 2 处的前两个字典的组合插入到您的列表中。为什么要索引 2?这使您可以在将列表转换为数据框时方便地对其进行切片,从而为您提供所需的输出

    df = pd.DataFrame(dict_list[2:])
    print(df)
    
         A    B  C  D  E
    0  1.0  2.0  3  4  5
    1  NaN  NaN  6  7  8
    

    为了比较,直接在未修改列表上调用pd.DataFrame给你

    df_unmodified = pd.DataFrame(dict_list)
    print(df_unmodified)
    
         A    B    C    D    E
    0  1.0  2.0  NaN  NaN  NaN
    1  NaN  NaN  3.0  4.0  5.0
    2  NaN  NaN  6.0  7.0  8.0
    

    之后,您可以使用df.to_csv() 将数据框保存到 csv 文件中

    【讨论】:

    • 这会导致预期的结果。谢谢!
    猜你喜欢
    • 2020-01-11
    • 2015-09-09
    • 1970-01-01
    • 2018-07-25
    • 1970-01-01
    • 1970-01-01
    • 2019-08-18
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多