【问题标题】:Calculations using two pandas dataframes使用两个 pandas 数据框进行计算
【发布时间】:2020-06-01 18:44:12
【问题描述】:

我有以下两个(简化的)数据框:

df1=
         origin destination  val1  val2
    0      1           A      0.8   0.9
    1      1           B      0.3   0.5
    2      1           c      0.4   0.2
    3      2           A      0.4   0.7
    4      2           B      0.2   0.1
    5      2           c      0.5   0.1
df2=
  org  price
0   1     50
1   2     45

我需要做的是从 df2 中选择每个来源的价格,将其乘以 df1 中 val1+val2 的总和并将其写入 csv 文件。

A的计算如下:

A => (0.8+0.9)* 50 + (0.4+ 0.7)* 45 = 134.5

这里,值 0.8、0.9、0.4 和 0.7 来自 df1,它们对应于 A 的 val1 和 val2 其中值 50 和 45 来自分别对应于原点 1 和 2 的 df2。 对于 B,计算将是

B => (0.3+0.5)*50 + (0.2+0.1)*45 = 53.5

对于 C,计算将是:

C => (0.4+0.2)*50 + (0.5+0.1)*45 = 57

最终的 CSV 文件应如下所示:

A,134.5

B,53.5

C,57 我为此编写了以下 python 代码:

# first convert the second table into a python dictionary so that I can refer price value at each origin
df2_dictionary = {}
for ind in df2.index:
    df2_dictionary[df2['org'][ind]] = float(df2['price'][ind])    

# now go through df1, add up val1 and val2 and add the result to the result dictionary. 
result = {}
for ind in df1.index:
    origin = df1['origin'][ind] 
    price = df2_dictionary[origin] # figure out the price from the dictionary.
    r = (df1['val1'][ind] + df1['val2'][ind])*price # this is the needed calculation 
    destination = df1['destination'][ind] # store the result in destination
    if(destination in result.keys()):
        result[destination] = result[destination]+r
    else:
        result[destination] = r
f = open("result.csv", "w")
for key in result:
    f.write(key+","+str(result[key])+"\n")
f.close() 

这是很多工作,并且不使用 pandas 的内置函数。我该如何简化呢?我并不担心效率。

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    您的问题可以通过mapgroupby 解决:

    df1['total'] = (df1[['val1','val2']].sum(1)
                       .mul(df1['origin']
                                .map(df2.set_index('org').price)
                           )
                   )
    
    summary = df1.groupby('destination')['total'].sum()
    
    # save to csv
    summary.to_csv('/path/to/file.csv')
    

    输出(summary):

    destination
    A    134.5
    B     53.5
    c     57.0
    Name: total, dtype: float64
    

    【讨论】:

      猜你喜欢
      • 2018-08-10
      • 2014-05-04
      • 1970-01-01
      • 2021-09-13
      • 1970-01-01
      • 2017-06-18
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多