【问题标题】:updating dict value does not update global var it references更新 dict 值不会更新它引用的全局变量
【发布时间】:2025-12-26 05:45:17
【问题描述】:

我正在尝试遍历数据帧的字典,使用函数对其进行修改,然后将返回的 dfs 分配给它们的全局变量。我希望字典的键值对中的任何值都是指向传递的变量的指针。相反,它似乎只更新 data 字典中的值。这是出乎意料的。我对标识符有什么误解?我发现this question在下半场问了同样的事情,但我不明白接受的答案。

请看下面我的演示:

import pandas as pd

bids = pd.read_csv('data/as_bid_aggregated_data.csv')
plans = pd.read_csv('data/as_plan.csv')
energy_prices = pd.read_csv('data/as_bid_aggregated_data.csv')
price_vol = pd.read_csv('data/as_price_vol.csv')
generation = pd.read_csv('data/generation.csv')

data = {'bids':bids,
        'plans':plans,
        'energy_prices':energy_prices,
        'price_vol':price_vol,
        'generation':generation,
       }

我评估 bids 以展示它在导入后最初的样子:

bids.head().to_clipboard()
 OUTPUT: 
 note the index, date, and hr_beg cols. These should be modified for all dfs in data after processing. 
 V  V          V
    date          hr_beg    OFFNS_Unweighted Average Price  OFFNS_Max Price OFFNS_Min Price OFFNS_Total Quantity    OFFNS_Number of Bids    OFFNS_Weighted Avg Price    ONNS_Unweighted Average Price   ONNS_Max Price  ONNS_Min Price  ONNS_Total Quantity ONNS_Number of Bids ONNS_Weighted Avg Price REGDN_Unweighted Average Price  REGDN_Max Price REGDN_Min Price REGDN_Total Quantity    REGDN_Number of Bids    REGDN_Weighted Avg Price    REGUP_Unweighted Average Price  REGUP_Max Price REGUP_Min Price REGUP_Total Quantity    REGUP_Number of Bids    REGUP_Weighted Avg Price    RRSGN_Unweighted Average Price  RRSGN_Max Price RRSGN_Min Price RRSGN_Total Quantity    RRSGN_Number of Bids    RRSGN_Weighted Avg Price    RRSNC_Unweighted Average Price  RRSNC_Max Price RRSNC_Min Price RRSNC_Total Quantity    RRSNC_Number of Bids    RRSNC_Weighted Avg Price
# 0 2014-01-01  0   43.3190909090909    300.01  0.01    38144.7 22  59.51279016481975   22.016969696969696  250.0   1.0 32531.499999999985  33  36.74238980680264   20.669076923076922  500.0   0.92    71971.59999999992   65  26.577483215601717  19.744255319148944  500.0   0.01    56916.80000000003   47  27.33264099527731   20.85708333333334   500.0   0.01    107723.6    48  30.19552034094665   1.5 3.0 0.0 2236.8  2   1.5996512875536482
# 1 2014-01-01  1   43.342727272727274  300.01  0.01    38216.4 22  59.505340220428934  20.93514285714285   250.0   1.0 34781.19999999998   35  34.95683860821363   21.764761904761905  500.0   0.8 70412.39999999994   63  27.92263442234607   18.834375000000012  500.0   0.01    50201.80000000002   48  28.87979570453649   19.6692 500.0   0.01    107145.0    50  30.00068717158991   1.5 3.0 0.0 2235.8  2   1.599695858305752
# 2 2014-01-01  2   43.34818181818181   300.01  0.01    38336.9 22  59.49848289767822   20.97   250.0   1.0 34741.39999999999   35  35.091575987150776  21.836461538461545  500.0   0.58    72212.29999999992   65  28.27043938498013   18.856041666666666  500.0   0.01    50769.90000000001   48  28.61359006025224   19.5252 500.0   0.01    105503.8    50  30.27549695840339   1.5 3.0 0.0 2236.2  2   1.5996780252213578
# 3 2014-01-01  3   43.35000000000001   300.01  0.01    38374.5 22  59.492013316134425  21.00257142857142   250.0   1.0 34761.399999999994  35  35.11167079001421   22.38730158730159   500.0   0.53    70801.39999999994   63  28.66950969896075   18.854583333333334  500.0   0.01    50313.10000000001   48  28.865852233314985  19.5298 500.0   0.01    105024.0    50  30.41884454981718   1.5 3.0 0.0 2238.2  2   1.5995889554105982
# 4 2014-01-01  4   46.431  300.01  0.01    33460.8 20  64.00475684980633   20.75628571428571   250.0   1.0 34829.29999999999   35  34.791386648597594  21.684531250000006  500.0   0.7 71841.29999999992   64  27.846364904309922  19.238510638297864  500.0   0.01    50767.90000000001   47  28.70213516808849   19.801836734693875  500.0   0.01    104199.79999999996  49  30.477332029428077  1.5 3.0 0.0 2242.4  2   1.5994024259721726

然后我创建一个函数来修改给定的数据框。它结合 cols 来创建单个日期时间索引,并用它替换索引,为了简单起见,我已经编辑了逻辑。

def create_dt(input_df):
    '''create a dataframe with a datetime index from multiple cols
    '''
    df = input_df.copy()
    #modify the df
    df = df.set_index(dt_index)
    df = df.drop(columns=[date_col,hr_col])
    return df

然后我尝试解压缩数据,将它们传递到create_dt() 并分配结果。我希望这可以通过字典中的指针更新每个 df 的全局变量。

for key, df in data.items():
    data[key] = create_dt(data[key],'date','hr_beg')

我评估 bids 全局,发布函数调用。它保持不变。

# OUTPUT:
bids.head().to_clipboard()
# note the index, date, and hr_beg cols. Same as initial value
# V  V          V
#       date    hr_beg  OFFNS_Unweighted Average Price  OFFNS_Max Price OFFNS_Min Price OFFNS_Total Quantity    OFFNS_Number of Bids    OFFNS_Weighted Avg Price    ONNS_Unweighted Average Price   ONNS_Max Price  ONNS_Min Price  ONNS_Total Quantity ONNS_Number of Bids ONNS_Weighted Avg Price REGDN_Unweighted Average Price  REGDN_Max Price REGDN_Min Price REGDN_Total Quantity    REGDN_Number of Bids    REGDN_Weighted Avg Price    REGUP_Unweighted Average Price  REGUP_Max Price REGUP_Min Price REGUP_Total Quantity    REGUP_Number of Bids    REGUP_Weighted Avg Price    RRSGN_Unweighted Average Price  RRSGN_Max Price RRSGN_Min Price RRSGN_Total Quantity    RRSGN_Number of Bids    RRSGN_Weighted Avg Price    RRSNC_Unweighted Average Price  RRSNC_Max Price RRSNC_Min Price RRSNC_Total Quantity    RRSNC_Number of Bids    RRSNC_Weighted Avg Price
# 0 2014-01-01  0   43.3190909090909    300.01  0.01    38144.7 22  59.51279016481975   22.016969696969696  250.0   1.0 32531.499999999985  33  36.74238980680264   20.669076923076922  500.0   0.92    71971.59999999992   65  26.577483215601717  19.744255319148944  500.0   0.01    56916.80000000003   47  27.33264099527731   20.85708333333334   500.0   0.01    107723.6    48  30.19552034094665   1.5 3.0 0.0 2236.8  2   1.5996512875536482
# 1 2014-01-01  1   43.342727272727274  300.01  0.01    38216.4 22  59.505340220428934  20.93514285714285   250.0   1.0 34781.19999999998   35  34.95683860821363   21.764761904761905  500.0   0.8 70412.39999999994   63  27.92263442234607   18.834375000000012  500.0   0.01    50201.80000000002   48  28.87979570453649   19.6692 500.0   0.01    107145.0    50  30.00068717158991   1.5 3.0 0.0 2235.8  2   1.599695858305752
# 2 2014-01-01  2   43.34818181818181   300.01  0.01    38336.9 22  59.49848289767822   20.97   250.0   1.0 34741.39999999999   35  35.091575987150776  21.836461538461545  500.0   0.58    72212.29999999992   65  28.27043938498013   18.856041666666666  500.0   0.01    50769.90000000001   48  28.61359006025224   19.5252 500.0   0.01    105503.8    50  30.27549695840339   1.5 3.0 0.0 2236.2  2   1.5996780252213578
# 3 2014-01-01  3   43.35000000000001   300.01  0.01    38374.5 22  59.492013316134425  21.00257142857142   250.0   1.0 34761.399999999994  35  35.11167079001421   22.38730158730159   500.0   0.53    70801.39999999994   63  28.66950969896075   18.854583333333334  500.0   0.01    50313.10000000001   48  28.865852233314985  19.5298 500.0   0.01    105024.0    50  30.41884454981718   1.5 3.0 0.0 2238.2  2   1.5995889554105982
# 4 2014-01-01  4   46.431  300.01  0.01    33460.8 20  64.00475684980633   20.75628571428571   250.0   1.0 34829.29999999999   35  34.791386648597594  21.684531250000006  500.0   0.7 71841.29999999992   64  27.846364904309922  19.238510638297864  500.0   0.01    50767.90000000001   47  28.70213516808849   19.801836734693875  500.0   0.01    104199.79999999996  49  30.477332029428077  1.5 3.0 0.0 2242.4  2   1.5994024259721726

然后我评估数据中的出价数据帧 k-v 对。修改成功。

data['bids'].head().to_clipboard()
#OUTPUT
# note datetime index, no date or hr_beg cols, see .columns() output one cell below. 
# V
#   OFFNS_Unweighted Average Price  OFFNS_Max Price OFFNS_Min Price OFFNS_Total Quantity    OFFNS_Number of Bids    OFFNS_Weighted Avg Price    ONNS_Unweighted Average Price   ONNS_Max Price  ONNS_Min Price  ONNS_Total Quantity ONNS_Number of Bids ONNS_Weighted Avg Price REGDN_Unweighted Average Price  REGDN_Max Price REGDN_Min Price REGDN_Total Quantity    REGDN_Number of Bids    REGDN_Weighted Avg Price    REGUP_Unweighted Average Price  REGUP_Max Price REGUP_Min Price REGUP_Total Quantity    REGUP_Number of Bids    REGUP_Weighted Avg Price    RRSGN_Unweighted Average Price  RRSGN_Max Price RRSGN_Min Price RRSGN_Total Quantity    RRSGN_Number of Bids    RRSGN_Weighted Avg Price    RRSNC_Unweighted Average Price  RRSNC_Max Price RRSNC_Min Price RRSNC_Total Quantity    RRSNC_Number of Bids    RRSNC_Weighted Avg Price
# 2014-01-01 00:00:00   43.3190909090909    300.01  0.01    38144.7 22  59.51279016481975   22.016969696969696  250.0   1.0 32531.499999999985  33  36.74238980680264   20.669076923076922  500.0   0.92    71971.59999999992   65  26.577483215601717  19.744255319148944  500.0   0.01    56916.80000000003   47  27.33264099527731   20.85708333333334   500.0   0.01    107723.6    48  30.19552034094665   1.5 3.0 0.0 2236.8  2   1.5996512875536482
# 2014-01-01 01:00:00   43.342727272727274  300.01  0.01    38216.4 22  59.505340220428934  20.93514285714285   250.0   1.0 34781.19999999998   35  34.95683860821363   21.764761904761905  500.0   0.8 70412.39999999994   63  27.92263442234607   18.834375000000012  500.0   0.01    50201.80000000002   48  28.87979570453649   19.6692 500.0   0.01    107145.0    50  30.00068717158991   1.5 3.0 0.0 2235.8  2   1.599695858305752
# 2014-01-01 02:00:00   43.34818181818181   300.01  0.01    38336.9 22  59.49848289767822   20.97   250.0   1.0 34741.39999999999   35  35.091575987150776  21.836461538461545  500.0   0.58    72212.29999999992   65  28.27043938498013   18.856041666666666  500.0   0.01    50769.90000000001   48  28.61359006025224   19.5252 500.0   0.01    105503.8    50  30.27549695840339   1.5 3.0 0.0 2236.2  2   1.5996780252213578
# 2014-01-01 03:00:00   43.35000000000001   300.01  0.01    38374.5 22  59.492013316134425  21.00257142857142   250.0   1.0 34761.399999999994  35  35.11167079001421   22.38730158730159   500.0   0.53    70801.39999999994   63  28.66950969896075   18.854583333333334  500.0   0.01    50313.10000000001   48  28.865852233314985  19.5298 500.0   0.01    105024.0    50  30.41884454981718   1.5 3.0 0.0 2238.2  2   1.5995889554105982
# 2014-01-01 04:00:00   46.431  300.01  0.01    33460.8 20  64.00475684980633   20.75628571428571   250.0   1.0 34829.29999999999   35  34.791386648597594  21.684531250000006  500.0   0.7 71841.29999999992   64  27.846364904309922  19.238510638297864  500.0   0.01    50767.90000000001   47  28.70213516808849   19.801836734693875  500.0   0.01    104199.79999999996  49  30.477332029428077  1.5 3.0 0.0 2242.4  2   1.5994024259721726

data['bids'].columns()

#OUTPUT:
# Index(['OFFNS_Unweighted Average Price', 'OFFNS_Max Price', 'OFFNS_Min Price',
#        'OFFNS_Total Quantity', 'OFFNS_Number of Bids',
#        'OFFNS_Weighted Avg Price', 'ONNS_Unweighted Average Price',
#        'ONNS_Max Price', 'ONNS_Min Price', 'ONNS_Total Quantity',
#        'ONNS_Number of Bids', 'ONNS_Weighted Avg Price',
#        'REGDN_Unweighted Average Price', 'REGDN_Max Price', 'REGDN_Min Price',
#        'REGDN_Total Quantity', 'REGDN_Number of Bids',
#        'REGDN_Weighted Avg Price', 'REGUP_Unweighted Average Price',
#        'REGUP_Max Price', 'REGUP_Min Price', 'REGUP_Total Quantity',
#        'REGUP_Number of Bids', 'REGUP_Weighted Avg Price',
#        'RRSGN_Unweighted Average Price', 'RRSGN_Max Price', 'RRSGN_Min Price',
#        'RRSGN_Total Quantity', 'RRSGN_Number of Bids',
#        'RRSGN_Weighted Avg Price', 'RRSNC_Unweighted Average Price',
#        'RRSNC_Max Price', 'RRSNC_Min Price', 'RRSNC_Total Quantity',
#        'RRSNC_Number of Bids', 'RRSNC_Weighted Avg Price'],
#       dtype='object')

【问题讨论】:

  • 嗯,我把它剥离了很多,试图提供足够的上下文。我会更进一步。
  • 我希望这会通过字典中的指针更新每个 df 的全局变量。您希望 bidsplans 等的值修改字典时改变?我理解正确了吗?
  • 是的,我希望在我修改 data[key] 中的值时更新全局变量:bidsplans 等的值
  • 啊,他们不应该这样。我正在努力想一些体面的资源,可以更好地解释事情......
  • 不相关,但在您的代码示例中,您在调用 create_dt 时提供了三个参数,但 func def 中只有一个参数?

标签: python pandas


【解决方案1】:

在全局范围内定义多个数据框:

bids = pd.read_csv('data/as_bid_aggregated_data.csv')
plans = pd.read_csv('data/as_plan.csv')
energy_prices = pd.read_csv('data/as_bid_aggregated_data.csv')
price_vol = pd.read_csv('data/as_price_vol.csv')
generation = pd.read_csv('data/generation.csv')

然后您创建一个包含以下键的字典并将值分配给上述数据框:


data = {'bids':bids,
        'plans':plans,
        'energy_prices':energy_prices,
        'price_vol':price_vol,
        'generation':generation,
       }

此时,您的键指向外部范围内的数据帧。

然后你调用一个函数来创建源数据帧的COPY,修改它并返回它。

def create_dt(input_df):
    '''create a dataframe with a datetime index from multiple cols
    '''
    df = input_df.copy()
    #modify the df
    df = df.set_index(dt_index)
    df = df.drop(columns=[date_col,hr_col])
    return df

for key, df in data.items():
    data[key] = create_dt(data[key],'date','hr_beg')

此时从create_dt() 返回的df 与给定的数据帧不同(您创建了一个副本),并更改了字典中的引用data。因此,没有理由修改外部范围数据帧。 (如果您删除 input_df.copy() 行,它可能会按您的预期工作)

无论如何,如果这是你在函数中所做的一切,那么如果你*想要*修改结果,那么没有理由不从外部范围执行循环中的删除列。 em>

【讨论】:

  • 啊。我本来希望data[key] = create_dt(data[key],'date','hr_beg') 仍然会更新从create_dt 返回的任何内容,即使不是数据框。实际上,我显式地创建了该副本,以避免任何副作用,并在函数的上下文中执行所有操作,并在全局范围内显式地分配结果和返回值。编辑:哦,我做得更多,为了清楚起见,我只是把它去掉了。
  • 由于复制,字典中的引用发生了变化,从而使外部范围的数据帧保持不变。
  • 在这种情况下,字典键就像指向内存中引用的数据结构的指针。可以使用id()函数查看对象的引用发生了变化。
  • 是否有办法以我上面描述的“功能”样式执行操作,同时仍然更新全局?
  • 这真的归结为偏好。我可能会将数据框保存在data 字典中,并将其用作事实来源。您可以考虑将整个 data 字典传递给您的函数,并用修改后的版本覆盖字典。这取决于你。
【解决方案2】:

如果您想更改 pandas.DataFrame 对象并为指向该对象的所有变量更新它,那么您需要对您使用的所有 df 方法调用使用 inplace=True 参数。 Step through these examples in python tutor 更清楚地了解变量指向的对象:

import pandas as pd
'''Basicly What you were doing'''
def create_dt(input_df):
    df = input_df.copy()
    df = df.set_index(pd.Series(['i','j']))
    return df


x = pd.DataFrame({'a':[1,2],'b':[3,4]})
datax = {'x':x,}
for key, df in datax.items():
    datax[key] = create_dt(datax[key])
print(x)


'''Basicly What was recommended'''
def create_dt2(input_df):
    input_df = input_df.set_index(pd.Series(['i','j']))
    return input_df


y = pd.DataFrame({'a':[5,6],'b':[7,8]})
datay = {'y':y,}
for key, df in datay.items():
    datay[key] = create_dt2(datay[key])

print(y)

'''Using inplace = True is the only way to change the object'''
def modify_df(input_df):
    input_df.set_index(pd.Series(['i','j']), inplace=True)

z = pd.DataFrame({'a':[9,10],'b':[11,12]})
dataz = {'z':z,} 
for key, df in dataz.items():
    modify_df(dataz[key])

print(z)


【讨论】: