通过 for 循环 (?) 创建字典答案

【问题标题】：Creating a dictionary by for loop (?)通过 for 循环 (?) 创建字典
【发布时间】：2020-01-04 02:30:40
【问题描述】：

我必须为我的测量数据创建一个大字典。到目前为止，我的（简化的）代码如下所示：

i = 0  

for i in range(len(station_data_files_pandas)):  # range(0, 299)
    station_data_f_pandas = station_data_files_pandas[i]
    station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))
    Y_RR = station_data_f_pandas["MO_RR"].resample("A").apply(very_sum)

    # creating the dictionary layer for the anual data in this dictionary
    anual_data = {
            "Y_RR" : Y_RR
            }
    # creating the dictionary layer for the montly data in this dictionary
    montly_data = {
            "MO_RR"    
            }
    # creating the dictionary layer for every station. Everystation has montly and anual data
    station = {
            "montly_data" : montly_data,
            "anual_data" : anual_data
            }
    # creating the dictionary layer where the staiondata can get called by station id
    station_data_dic = {
            station_id : station
            }
    # creating the final layer of the dictionary
    station_data_dictionary = {
            "station_data": station_data_dic
            }

这是输出：

station_data_dictionary
Out[387]: 
{'station_data': {'4706': {'montly_data': {'MO_RR'},   # "4706" is the id from the last element in station_data_files_pandas
   'anual_data': {'Y_RR': YearMonth
           # YearMonth is the index...
           # I actually wanted the Index just to show yyyy-mm ...
    1981-12-31    1164.3
    1982-12-31     852.4
    1983-12-31     826.5
    1984-12-31     798.8
    1985-12-31       NaN
    1986-12-31       NaN
    1987-12-31       NaN
    1988-12-31       NaN
    1989-12-31       NaN
    1990-12-31    1101.1
    1991-12-31     892.4
    1992-12-31     802.1
    1993-12-31     873.5
    1994-12-31     842.7
    1995-12-31     962.0
    1996-12-31       NaN
    1997-12-31     927.9
    1998-12-31       NaN
    1999-12-31       NaN
    2000-12-31     997.8
    2001-12-31     986.3
    2002-12-31    1117.6
    2003-12-31     690.8
    2004-12-31       NaN
    2005-12-31       NaN
    2006-12-31       NaN
    2007-12-31       NaN
    2008-12-31       NaN
    2009-12-31       NaN
    2010-12-31       NaN
    Freq: A-DEC, Name: MO_RR, dtype: float64}}}}

如您所见，我的输出仅包含一个“工作表”。预计为 300 张。

我假设我的代码在循环时会覆盖数据，因此最后我的输出只是由 station_data_files_pandas 中的最后一个元素组成的一张表。我怎样才能解决这个问题？我的方法可能完全错误吗？...

当它准备好时，它必须看起来像：

station_data_dictionary["station_data"]["403"]["anual_data"]["Y_RR"]
station_data_dictionary["station_data"]["573"]["anual_data"]["Y_RR"]
station_data_dictionary["station_data"]["96"]["anual_data"]["Y_RR"]

...等等。

如您所见，唯一允许我更改的就是我的 station_id，因为我在字典中称其为不同的东西。

注意：有一个问题的标题完全相同，但对我一点帮助都没有......

【问题讨论】：

在for 循环外创建station_data_dictionary，然后在循环内添加条目。
@Seb 谢谢你的回答。我已经尝试过这个，但我没有成功。你能告诉我如何在循环中添加条目吗？
你试过station_data_files_pandas.to_dict()或station_data_files_pandas.to_json()吗？我不知道原始数据是什么样的，所以这些函数可能不相关。
@MarkMoretto 谢谢你的回答。 station_data_files_pandas 是 pandas 数据帧的列表。当您调用 station_data_files_pandas[0] 时，您会得到一个 pandas 数据帧。嗯，谢谢您的建议，但我认为，我不能使用快捷方式。

标签： python pandas loops for-loop

【解决方案1】：

我没有对此进行测试，因为我没有您的数据，但这应该会生成您所需的字典。唯一的变化是在顶部和底部：

station_data_dictionary = {
    "station_data": {}
}

for i in range(len(station_data_files_pandas)):  # range(0, 299)

    station_data_f_pandas = station_data_files_pandas[i]

    station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))

    Y_RR = station_data_f_pandas["MO_RR"].resample("A").apply(very_sum)

    # creating the dictionary layer for the anual data in this dictionary
    anual_data = {
            "Y_RR" : Y_RR
            }

    # creating the dictionary layer for the montly data in this dictionary
    montly_data = {
            "MO_RR"    
            }

    # creating the dictionary layer for every station. Everystation has montly and anual data
    station = {
            "montly_data" : montly_data,
            "anual_data" : anual_data
            }

    station_data_dictionary["station_data"][station_id] = station

请注意，for 循环之前不需要像 i = 0 这样的语句，因为循环会为您初始化变量。

此外，字典的 "station_data" 层似乎是多余的，因为它是该层的唯一键，但您在所需的输出中有它，所以我把它留在了。

【讨论】：

非常感谢！看起来这就是答案！我的问题是，我要么构建整个字典，要么构建整个字典。但我没想过分开。是的，字典的“station_data”层目前似乎是多余的，但我必须在同一级别上添加一个“关键”层。谢谢你离开它XD。并感谢您对循环的提示。它是否也适用于循环中的“j”？
不客气。当然，任何变量名称都有效，因此您可以选择在上下文中最有意义的名称。 i 是一般情况下的常用选择。
好的，谢谢。我曾经在一个循环中建立一个循环。在那里，我从第二个循环 j 中调用了变量。这就是为什么我很好奇。我不知道“station_data_dictionary["station_data"][station_id] = station”这一行。我必须在字典中的变量键之后执行此操作吗？
是的，当外部循环是i 时，j 是内部循环的自然选择。我不确定“在变量键之后执行此操作”中的“此”是什么意思，但语法d[station_id] = val 向字典d 添加了一个新条目，键取自变量@987654330 的值@.
好的，谢谢您的解释。我不认为，我对字典的理解不够好，能够在不偷看这段代码的情况下构建它们。我的血液中还没有它；）。但没关系。无论如何，我喜欢回收，我可能也在互联网上寻找其他例子。字典重要吗？

【解决方案2】：

在下面试试这个。此外，如果您需要字典以与添加它们相同的方式保持有序，则必须使用集合包中的 OrderedDict。

因此，当您打印字典或遍历其数据时，您将按照您在下面的代码中添加它们的顺序进行循环。

Obs：我假设 station_data_files_pandas 是一个列表，而不是字典，这就是为什么我更改了 for 循环“签名”以使用增强的 for。如果我错了，这个变量实际上是一个字典，而 for 循环的每个整数都是这个字典的一个键，你也可以像这样循环遍历项目：

for k, v in station_data_files_pandas.items():
    # now k carries the integer you were using before.
    # and v carries station_data_f_pandas

代码修正

import collections

station_data_dictionary=collections.OrderedDict()

#for i in range(len(station_data_files_pandas)):  # range(0, 299)
  # using the enhanced for loop
  for station_data_f_pandas in station_data_files_pandas:  # range(0, 299) 

    # This is not needed anymore    
    # station_data_f_pandas = station_data_files_pandas[i]

    # station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))
    # You could directly convert to string
    station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))

    Y_RR = station_data_f_pandas["MO_RR"].resample("A").apply(very_sum)
    MO_RR = # something goes here


    # creating the dictionary layer for the anual data in this dictionary
    anual_data = {
            "Y_RR" : Y_RR
            }

    # creating the dictionary layer for the montly data in this dictionary
    montly_data = {
            # "MO_RR"
            # You can't have just a key to your dictionary, you need to assign a value to it.

            "MO_RR": MO_RR             
            }

    # creating the dictionary layer for every station. Everystation has montly and anual data
    station = {
            "montly_data" : montly_data,
            "anual_data" : anual_data
            }

    # creating the dictionary layer where the staiondata can get called by station id

    station_data_dic = {
            station_id : station
            }


    # creating the final layer of the dictionary
    #station_data_dictionary = {
    #       "station_data": station_data_dic
    #        }

    # Why use {"apparently_useless_id_layer": {"actual_id_info": "data"}}
    # instead of {"actual_info_id": "data"} ?
    station_data_dictionary[station_id] = station

【讨论】：

谢谢！这也有效！我认为最关键的事情是，我没有在循环之外构建我的主字典，并且我没有这一行：“station_data_dictionary[f”station_data_{i}“] = station_data_dic”。也感谢您对评论的提示。这很有用！