【问题标题】:Save pandas dataframes in loop after manipulating操作后循环保存熊猫数据帧
【发布时间】:2020-04-10 14:03:08
【问题描述】:

我有一个循环,它采用一系列现有数据帧并操纵它们的格式和值。我需要知道如何在循环结束时创建包含修改内容的新数据框。

示例如下:

import pandas as pd

# Create datasets
First = {'GDP':[200,175,150,100]}
Second = {'GDP':[550,200,235,50]}

# Create old_dataframes
old_df_1 = pd.DataFrame(First)
old_df_2 = pd.DataFrame(Second)

# Define references and dictionary
old_dfs = [old_df_1, old_df_2]
new_dfs = ['new_df_1','new_df_2']
dictionary = {}

# Begin Loop
for df, name in zip(old_dfs, new_dfs):

    # Multiply all GDP values by 1.5 in both dataframes
    df = df * 1.5    

    # ISSUE HERE - Supposed to Create new data frames 'new_df_1' & 'new_df_2' containing df*1.5 values: Only appends to dictionary. Does not create new_df_1 & new_df_2
    dictionary[name] = df

# Check for the existance of 'new_df_1 & new_df_2' (They will not appear)
%who_ls DataFrame

问题:我已在上面标记了问题。我的代码不会创建“new_df_1”和“new_df_2”数据框。它只是将它们附加到字典中。我需要能够创建 new_df_1 和 new_df_2 作为单独的数据框。

【问题讨论】:

  • 你能给出你的输入和预期输出的样本吗?这样,当循环说完后,在dfs 中看到你需要什么会更清楚
  • 感谢您对我的问题坚持不懈。我刚刚创建了一个示例供您学习。我希望很清楚。如果运行正确,最终命令 %who_ls DataFrame 应该返回 ['df', 'old_df_1', 'old_df_2','new_df_1','new_df_2']
  • 您发布的循环代码有什么问题?
  • 我的代码的问题是我的代码没有创建 'new_df_1' 和 'new_df_2' 作为数据框。行 dictionary[name] = df 只是将它们完全附加到字典中。我不知道如何在循环中创建 dfs。如果您运行我的代码并使用 %who_ls_DataFrame 检查新的数据帧,您将找不到 'new_df_1 & new_df_2
  • 使用数据框字典有什么问题?实际上,这是首选的方法,而不是用 许多 数据帧淹没您的全局环境,而是使用 一个 索引数据帧容器。如果数据框存储在 dict、list、tuple 等中,则不会丢失任何功能。

标签: python pandas dataframe


【解决方案1】:
from copy import deepcopy   #  to copy old dataframes appropriately

# create 2 lists, first holds old dataframes and second holds modified ones
old_dfs_list, new_dfs_list = [pd.DataFrame(First), pd.DataFrame(Second)], []

# process old dfs one by one by iterating over old_dfs_list, 
# copy, modify each and append it to list of new_dfs_list with same index as 
# old df ... so old_dfs_list[1] is mapped to new_dfs_list[1]

for i in range(len(old_dfs_list)):
  # a deep copy prevent changing old dfs by reference
  df_deep_copy = deepcopy(old_dfs_list[i]) 
  df_deep_copy['GDP'] *= 1.5
  new_dfs_list.append(df_deep_copy)

print(old_dfs_list[0])   # to check that old dfs are not changed
print(new_dfs_list[0])

您也可以尝试,而不是名单字典使用你喜欢的名称: P>

import pandas as pd
datadicts_dict = { 
                    'first' :{'GDP':[200,175,150,100]}, 
                    'second':{'GDP':[550,200,235,50]}, 
                    'third' :{'GDP':[600,400,520,100, 800]}
                    }

# Create datasets and store it in a python dictionary
old_dfs_dict, new_dfs_dict = {}, {}    # initialize 2 dicts to hold original and modified dataframes

# process datasets one by one by iterating over datadicts_dict, 
# convert to df save it in old_dfs_dict with same name as the key
# copy, modify each and put it in new_dfs_dict with same key 
# so dataset of key 'first' in datadicts_dict is saved as old_dfs_dict['first'] 
# modified and mapped to new_dfs_dict['first']

for dataset_name, data_dict in datadicts_dict.items():
    old_dfs_dict[dataset_name] = pd.DataFrame({'GDP':data_dict['GDP']})
    new_dfs_dict[dataset_name] = pd.DataFrame({'GDP':data_dict['GDP']}) * 1.5

print(old_dfs_dict['third'])   # to check that old dfs are not changed
print(new_dfs_dict['third'])

【讨论】:

  • 使用这种方法的问题是,它是追加dataframes在一起。我需要创建单独的dataframes“new_df_1&‘new_df_2’。你能告诉我如何调整/重写我的命令[辞典[名] = DF。我只使用这种方法来引用dataframes的名称。我真的不希望将数据存储在字典中。 SPAN>
  • 你有new_dfs_list [0],new_dfs_list [1],如果你想要更多。 dataframes不附加到一起,它们被保存在dataframes列表:new_dfs_list。为exampe如果你有4个dataframes初始化old_dfs_list:old_dfs_list [0],old_dfs_list [1],old_dfs_list [2],old_dfs_list [3] ...您new_dfs_list将conatin也有新的4个修改dataframes。想象一下,你有100个老dataframes ......这是低效与名称new_df_1,new_df_2创建100个变量,......,new_df_100而不是创建一个列表new_dframes和索引它:new_dframes [0],... ,new_dframes [99] 跨度>
【解决方案2】:

通过思考上面的答案,我最终偶然发现了一个可行的解决方案。我面临的问题是 - 从字典中提取附加数据。我并没有真正想到我可以从循环外部的字典中提取数据,然后形成数据框。

.
.
.
 # Begin Loop
    for df, name in zip(old_dfs, new_dfs):
    # Multiply all GDP values by 1.5 in both dataframes
    df = df * 1.5    

    # ISSUE HERE - Supposed to Create new data frames 'new_df_1' & 'new_df_2' containing df*1.5 values: Only appends to dictionary. Does not create new_df_1 & new_df_2
    dictionary[name] = df

# Solution - Extract from Dictionary and form Dataframe
new_df_1 = pd.DataFrame.from_dict(dictionary['new_df_1'])
new_df_2 = pd.DataFrame.from_dict(dictionary['new_df_2'])

# Check for the existance of 'new_df_1 & new_df_2'
%who_ls DataFrame

【讨论】:

  • 您不需要from_dict,但可以简单地分配:new_df_1 = dictionary['new_df_1']。但如上所述,数据帧字典是存储许多类似结构化数据帧的首选方式,您不会丢失任何功能:dictionary['new_df_1'].head()dictionary['new_df_1'].tail()dictionary['new_df_1'].describe() ...
猜你喜欢
  • 2021-12-22
  • 1970-01-01
  • 1970-01-01
  • 2018-01-22
  • 2017-01-11
  • 2018-02-11
  • 2022-01-25
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多