【问题标题】:Is there a better way to join two dataframes in Pandas?有没有更好的方法在 Pandas 中加入两个数据框?
【发布时间】:2021-12-03 23:48:20
【问题描述】:

我必须将数据从 SQL 表转换为 pandas 并显示输出。数据是一个销售表:

      cust     prod  day  month  year state  quant
0    Bloom    Pepsi    2     12  2017    NY   4232
1    Knuth    Bread   23      5  2017    NJ   4167
2    Emily    Pepsi   22      1  201    CT   4404
3    Emily   Fruits   11      1  2010    NJ   4369
4    Helen     Milk    7     11  2016    CT    210

我必须将其转换为 2017 年每个州每个客户的平均销售额:

CUST   AVG_NY   AVG_CT AVG_NJ
Bloom  28923    3241   1873
Sam    4239     872    142

下面是我的代码:

import pandas as pd
import psycopg2 as pg
engine = pg.connect("dbname='postgres' user='postgres' host='127.0.0.1' port='8800' password='sh'")
df = pd.read_sql('select * from sales', con=engine) 

df.drop("prod", axis=1, inplace=True)
df.drop("day", axis=1, inplace=True)
df.drop("month", axis=1, inplace=True)
df_main = df.loc[df.year == 2017]

#df.drop(df[df['state'] != 'NY'].index, inplace=True) 
df2 = df_main.loc[df_main.state == 'NY']
df2.drop("year",axis=1,inplace=True)

NY = df2.groupby(['cust']).mean()




df3 = df_main.loc[df_main.state == 'CT']
df3.drop("year",axis=1,inplace=True)

CT = df3.groupby(['cust']).mean()


df4 = df_main.loc[df_main.state == 'NJ']
df4.drop("year",axis=1,inplace=True)

NJ = df4.groupby(['cust']).mean()
 

NY = NY.join(CT,how='left',lsuffix = 'NY', rsuffix = '_right')
NY = NY.join(NJ,how='left',lsuffix = 'NY', rsuffix = '_right')

print(NY)

这给了我这样的输出:

           quantNY  quant_right        quant
cust
Bloom  3201.500000       3261.0  2277.000000
Emily  2698.666667       1432.0  1826.666667
Helen  4909.000000       2485.5  2352.166667

我发现了一个问题,我可以将列名更改为我需要的输出,但我不确定以下两行代码是否是加入数据框的正确方法:

NY = NY.join(CT,how='left',lsuffix = 'NY', rsuffix = '_right')
NY = NY.join(NJ,how='left',lsuffix = 'NY', rsuffix = '_right')

有没有更好的方法用 Pandas 做到这一点?

【问题讨论】:

    标签: python pandas postgresql dataframe


    【解决方案1】:

    使用pivot_table:

    df.pivot_table(index=['year', 'cust'], columns='state',
                   values='quant', aggfunc='mean').add_prefix('AVG_')
    

    【讨论】:

    • 我会在哪里添加这个?我要删除 NY 语句吗?
    • df = pd.read_sql(...)之后删除所有内容。
    猜你喜欢
    • 2020-12-06
    • 2011-02-27
    • 1970-01-01
    • 2019-05-27
    • 2021-12-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多