如何在 Python 中从具有 N 行的 csv 文件创建嵌套字典答案

【问题标题】：How to create a nested dictionary from a csv file with N rows in Python如何在 Python 中从具有 N 行的 csv 文件创建嵌套字典
【发布时间】：2018-03-15 00:45:12
【问题描述】：

我正在寻找一种将包含未知列数的 csv 文件读入嵌套字典的方法。即用于表单的输入

file.csv:
1,  2,  3,  4
1,  6,  7,  8
9, 10, 11, 12

我想要一个格式如下的字典：

{1:{2:{3:4}, 6:{7:8}}, 9:{10:{11:12}}}

这是为了允许 O(1) 搜索 csv 文件中的值。创建字典可能需要相对较长的时间，因为在我的应用程序中我只创建一次，但要搜索数百万次。

我还想要一个选项来命名相关列，这样我就可以忽略不必要的一次

【问题讨论】：

标签： python csv dictionary hashmap nested

【解决方案1】：

这是一个简单但脆弱的方法：

>>> d = {}
>>> with io.StringIO(s) as f: # fake a file
...     reader = csv.reader(f)
...     for row in reader:
...         nested = d
...         for val in map(int, row[:-2]):
...             nested = nested.setdefault(val, {})
...         k, v = map(int, row[-2:]) # this will fail if you don't have enough columns
...         nested[k] = v
...
>>> d
{1: {2: {3: 4}, 6: {7: 8}}, 9: {10: {11: 12}}}

但是，这假设列数至少为 2。

【讨论】：

有机会获得更多解释这是如何工作的吗？看起来“nested = d”行发生了一些有趣的事情，我猜它利用了我不完全理解的 [pointers?] 的一个有趣特性。当我一次通过这一行时， d 和 nested 的值不同，我没有看到 d 被显式更新（但它仍在更新）？

【解决方案2】：

这是我想出的。随时发表评论并提出改进建议。

import csv
import itertools

def list_to_dict(lst):
    # Takes a list, and recursively turns it into a nested dictionary, where
    # the first element is a key, whose value is the dictionary created from the 
    # rest of the list. the last element in the list will be the value of the
    # innermost dictionary
    # INPUTS:
    #   lst - a list (e.g. of strings or floats)
    # OUTPUT:
    #   A nested dictionary
    # EXAMPLE RUN:
    #   >>> lst = [1, 2, 3, 4]
    #   >>> list_to_dict(lst)
    #   {1:{2:{3:4}}}
    if len(lst) == 1:
        return lst[0]
    else:
        data_dict = {lst[-2]: lst[-1]}
        lst.pop()
        lst[-1] = data_dict
        return list_to_dict(lst)


def dict_combine(d1, d2):
    # Combines two nested dictionaries into one.
    # INPUTS:
    #   d1, d2: Two nested dictionaries. The function might change d1 and d2, 
    #           therefore if the input dictionaries are not to be mutated, 
    #           you should pass copies of d1 and d2.
    #           Note that the function works more efficiently if d1 is the 
    #           bigger dictionary.
    # OUTPUT:
    #   The combined dictionary
    # EXAMPLE RUN:
    #   >>> d1 = {1: {2: {3: 4, 5: 6}}}
    #   >>> d2 = {1: {2: {7: 8}, 9: {10, 11}}}
    #   >>> dict_combine(d1, d2)
    #   {1: {2: {3: 4, 5: 6, 7: 8}, 9: {10, 11}}}

    for key in d2:
        if key in d1:
            d1[key] = dict_combine(d1[key], d2[key])
        else:
            d1[key] = d2[key]
    return d1


def csv_to_dict(csv_file_path, params=None, n_row_max=None):
    # NAME: csv_to_dict
    #
    # DESCRIPTION: Reads a csv file and turns relevant columns into a nested 
    #              dictionary.
    #
    # INPUTS:
    #   csv_file_path: The full path to the data file
    #   params:        A list of relevant column names. The resulting dictionary
    #                  will be nested in the same order as parameters in 'params'.
    #                  Default is None (read all columns)
    #   n_row_max:     The maximum number of rows to read. Default is None
    #                  (read all rows)
    #
    # OUTPUT:
    #   A nested dictionary containing all the relevant csv data

    csv_dictionary = {}

    with open(csv_file_path, 'r') as csv_file:
        csv_data = csv.reader(csv_file, delimiter=',')
        names  = next(csv_data)          # Read title line
        if not params:
            # A list of column indices to read from csv
            relevant_param_indices = list(range(0, len(names) - 1))  
        else:
            # A list of column indices to read from csv
            relevant_param_indices = []  
            for name in params:
                if name not in names:    
                # Parameter name is not found in title line
                    raise ValueError('Could not find {} in csv file'.format(name))
                else:
                # Get indices of the relevant columns
                    relevant_param_indices.append(names.index(name))   
        for row in itertools.islice(csv_data, 1, n_row_max):
            # Get a list containing relevant columns only
            relevant_cols = [row[i] for i in relevant_param_indices] 
            # Turn the string to numbers. Not necessary  
            float_row = [float(element) for element in relevant_cols]  
            # Build nested dictionary
            csv_dictionary = dict_combine(csv_dictionary, list_to_dict(float_row))  

        return csv_dictionary

【讨论】：