【问题标题】:How to create a nested dictionary from a csv file with N rows in Python如何在 Python 中从具有 N 行的 csv 文件创建嵌套字典
【发布时间】:2018-03-15 00:45:12
【问题描述】:

我正在寻找一种将包含未知列数的 csv 文件读入嵌套字典的方法。即用于表单的输入

file.csv:
1,  2,  3,  4
1,  6,  7,  8
9, 10, 11, 12

我想要一个格式如下的字典:

{1:{2:{3:4}, 6:{7:8}}, 9:{10:{11:12}}}

这是为了允许 O(1) 搜索 csv 文件中的值。 创建字典可能需要相对较长的时间,因为在我的应用程序中我只创建一次,但要搜索数百万次。

我还想要一个选项来命名相关列,这样我就可以忽略不必要的一次

【问题讨论】:

    标签: python csv dictionary hashmap nested


    【解决方案1】:

    这是一个简单但脆弱的方法:

    >>> d = {}
    >>> with io.StringIO(s) as f: # fake a file
    ...     reader = csv.reader(f)
    ...     for row in reader:
    ...         nested = d
    ...         for val in map(int, row[:-2]):
    ...             nested = nested.setdefault(val, {})
    ...         k, v = map(int, row[-2:]) # this will fail if you don't have enough columns
    ...         nested[k] = v
    ...
    >>> d
    {1: {2: {3: 4}, 6: {7: 8}}, 9: {10: {11: 12}}}
    

    但是,这假设列数至少为 2。

    【讨论】:

    • 有机会获得更多解释这是如何工作的吗?看起来“nested = d”行发生了一些有趣的事情,我猜它利用了我不完全理解的 [pointers?] 的一个有趣特性。当我一次通过这一行时, d 和 nested 的值不同,我没有看到 d 被显式更新(但它仍在更新)?
    【解决方案2】:

    这是我想出的。随时发表评论并提出改进建议。

    import csv
    import itertools
    
    def list_to_dict(lst):
        # Takes a list, and recursively turns it into a nested dictionary, where
        # the first element is a key, whose value is the dictionary created from the 
        # rest of the list. the last element in the list will be the value of the
        # innermost dictionary
        # INPUTS:
        #   lst - a list (e.g. of strings or floats)
        # OUTPUT:
        #   A nested dictionary
        # EXAMPLE RUN:
        #   >>> lst = [1, 2, 3, 4]
        #   >>> list_to_dict(lst)
        #   {1:{2:{3:4}}}
        if len(lst) == 1:
            return lst[0]
        else:
            data_dict = {lst[-2]: lst[-1]}
            lst.pop()
            lst[-1] = data_dict
            return list_to_dict(lst)
    
    
    def dict_combine(d1, d2):
        # Combines two nested dictionaries into one.
        # INPUTS:
        #   d1, d2: Two nested dictionaries. The function might change d1 and d2, 
        #           therefore if the input dictionaries are not to be mutated, 
        #           you should pass copies of d1 and d2.
        #           Note that the function works more efficiently if d1 is the 
        #           bigger dictionary.
        # OUTPUT:
        #   The combined dictionary
        # EXAMPLE RUN:
        #   >>> d1 = {1: {2: {3: 4, 5: 6}}}
        #   >>> d2 = {1: {2: {7: 8}, 9: {10, 11}}}
        #   >>> dict_combine(d1, d2)
        #   {1: {2: {3: 4, 5: 6, 7: 8}, 9: {10, 11}}}
    
        for key in d2:
            if key in d1:
                d1[key] = dict_combine(d1[key], d2[key])
            else:
                d1[key] = d2[key]
        return d1
    
    
    def csv_to_dict(csv_file_path, params=None, n_row_max=None):
        # NAME: csv_to_dict
        #
        # DESCRIPTION: Reads a csv file and turns relevant columns into a nested 
        #              dictionary.
        #
        # INPUTS:
        #   csv_file_path: The full path to the data file
        #   params:        A list of relevant column names. The resulting dictionary
        #                  will be nested in the same order as parameters in 'params'.
        #                  Default is None (read all columns)
        #   n_row_max:     The maximum number of rows to read. Default is None
        #                  (read all rows)
        #
        # OUTPUT:
        #   A nested dictionary containing all the relevant csv data
    
        csv_dictionary = {}
    
        with open(csv_file_path, 'r') as csv_file:
            csv_data = csv.reader(csv_file, delimiter=',')
            names  = next(csv_data)          # Read title line
            if not params:
                # A list of column indices to read from csv
                relevant_param_indices = list(range(0, len(names) - 1))  
            else:
                # A list of column indices to read from csv
                relevant_param_indices = []  
                for name in params:
                    if name not in names:    
                    # Parameter name is not found in title line
                        raise ValueError('Could not find {} in csv file'.format(name))
                    else:
                    # Get indices of the relevant columns
                        relevant_param_indices.append(names.index(name))   
            for row in itertools.islice(csv_data, 1, n_row_max):
                # Get a list containing relevant columns only
                relevant_cols = [row[i] for i in relevant_param_indices] 
                # Turn the string to numbers. Not necessary  
                float_row = [float(element) for element in relevant_cols]  
                # Build nested dictionary
                csv_dictionary = dict_combine(csv_dictionary, list_to_dict(float_row))  
    
            return csv_dictionary
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2023-03-20
      • 2021-03-04
      • 2017-07-25
      • 2021-02-20
      • 2021-08-13
      • 2022-11-05
      • 1970-01-01
      相关资源
      最近更新 更多