【问题标题】:Pandas converting Rows to Columns熊猫将行转换为列
【发布时间】:2018-05-22 22:44:05
【问题描述】:

我有一个 CSV,它生成以下格式的数据框

--------------------------------------------------------------
|Date       | Fund | TradeGroup | LongShort | Alpha | Details|
--------------------------------------------------------------
|2018-05-22 |A     | TGG-A      | Long      | 3.99  | Misc   |
|2018-05-22 |A     | TGG-B      | Long      | 4.99  | Misc   |
|2018-05-22 |B     | TGG-A      | Long      | 5.99  | Misc   |
|2018-05-22 |B     | TGG-B      | Short     | 6.99  | Misc   |
|2018-05-22 |C     | TGG-A      | Long      | 1.99  | Misc   |
|2018-05-22 |C     | TGG-B      | Long      | 5.29  | Misc   |
--------------------------------------------------------------

我想做的是,将 TradeGroup 组合在一起并将 Fund 转换为列。所以,最终的数据框应该是这样的:

  --------------------------------------------------------
  |TradeGroup| Date      | A         | B         | C     |
  --------------------------------------------------------
  | TGG-A    |2018-05-22 | 3.99      | 5.99      | 1.99  |
  | TGG-B    |2018-05-22 | 4.99      | 6.99      | 5.29  | 
  --------------------------------------------------------

另外,我并不真正关心 LongShort 列和详细信息列。所以,如果他们被丢弃也没关系。谢谢!! 我试过df.pivot(),但它没有提供所需的格式

【问题讨论】:

  • 试试df.set_index(['Date','TradeGroup','Fund']).unstack(level=2)['Alpha']

标签: python pandas dataframe pandas-groupby


【解决方案1】:

使用pd.pivot_table

res = df.pivot_table(index=['Date', 'TradeGroup'], columns='Fund',
                     values='Alpha', aggfunc='first').reset_index()

print(res)

Fund        Date TradeGroup     A     B     C
0     2018-05-22      TGG-A  3.99  5.99  1.99
1     2018-05-22      TGG-B  4.99  6.99  5.29

【讨论】:

    【解决方案2】:

    看起来您正试图从多索引中取消堆叠列。

    试试这个:

    import pandas as pd
    
    data = '''\
    Date        Fund  TradeGroup  LongShort  Alpha  Details
    2018-05-22 A      TGG-A       Long       3.99   Misc   
    2018-05-22 A      TGG-B       Long       4.99   Misc   
    2018-05-22 B      TGG-A       Long       5.99   Misc   
    2018-05-22 B      TGG-B       Short      6.99   Misc   
    2018-05-22 C      TGG-A       Long       1.99   Misc   
    2018-05-22 C      TGG-B       Long       5.29   Misc'''
    
    fileobj = pd.compat.StringIO(data)
    
    df = pd.read_csv(fileobj, sep='\s+')
    
    dfout = df.set_index(['TradeGroup','Date','Fund']).unstack()['Alpha']
    print(dfout)
    

    返回:

    Fund                      A     B     C
    TradeGroup Date                        
    TGG-A      2018-05-22  3.99  5.99  1.99
    TGG-B      2018-05-22  4.99  6.99  5.29
    

    如果你愿意,你也可以在之后申请.reset_index(),你会得到:

    Fund TradeGroup        Date     A     B     C
    0         TGG-A  2018-05-22  3.99  5.99  1.99
    1         TGG-B  2018-05-22  4.99  6.99  5.29
    

    【讨论】:

      猜你喜欢
      • 2021-08-08
      • 1970-01-01
      • 2019-05-22
      • 2022-12-18
      • 2017-04-08
      • 1970-01-01
      • 1970-01-01
      • 2021-12-18
      • 1970-01-01
      相关资源
      最近更新 更多