【问题标题】:pandas groupby object index turns into FUBARpandas groupby 对象索引变成 FUBAR
【发布时间】:2016-10-01 18:51:48
【问题描述】:

我有一个 pandas,我的 groupby 操作将索引变成了糊状。我需要日期作为索引,在每个股票代码组中排序

为了说明。像这样设置熊猫:

import pandas as pd
from StringIO import StringIO

text = """Date   Ticker        Open        High         Low   Adj_Close   Volume
    2015-04-09  vws.co  315.000000  316.100000  312.500000  311.520000  1686800
    2015-04-10  vws.co  317.000000  319.700000  316.400000  312.700000  1396500
    2015-04-13  vws.co  317.900000  321.500000  315.200000  315.850000  1564500
    2015-04-14  vws.co  320.000000  322.400000  318.700000  314.870000  1370600
    2015-04-15  vws.co  320.000000  321.500000  319.200000  316.150000   945000
    2015-04-16  vws.co  319.000000  320.200000  310.400000  307.870000  2236100
    2015-04-17  vws.co  309.900000  310.000000  302.500000  299.100000  2711900
    2015-04-20  vws.co  303.000000  312.000000  303.000000  306.490000  1629700
    2016-03-31     mmm  166.750000  167.500000  166.500000  166.630005  1762800
    2016-04-01     mmm  165.630005  167.740005  164.789993  167.529999  1993700
    2016-04-04     mmm  167.110001  167.490005  165.919998  166.399994  2022800
    2016-04-05     mmm  165.179993  166.550003  164.649994  165.809998  1610300
    2016-04-06     mmm  165.339996  167.080002  164.839996  166.809998  2092200
    2016-04-07     mmm  165.880005  167.229996  165.250000  167.160004  2721900"""

df = pd.read_csv(StringIO(text), delim_whitespace=1, parse_dates=[0], index_col=0)

还有代码

import pandas as pd
from pandas.io.data import DataReader
import numpy as np
import time
import os

stocklist = ['vws.co','nflx','mmm']


print ('df.tail (Input df)\n',df.tail(6),'\n')


def Screener(group):

    def diff_calc(group):

        df['Difference'] = df['Adj_Close'].diff()
        return df['Difference']

    df['Difference'] = diff_calc(group)
    return df

if __name__ == '__main__':

    df = GetStock(stocklist, start, end)
    df['Adj_Close'] = df['Adj Close']

    for ticker in stocklist:
        ### groupby screeener (filtering to only rel ticker group)
        df = df.groupby('Ticker', as_index=False).Adj_Close.apply(Screener)

    df.reset_index().sort(['Ticker', 'Date'], ascending=[1,1]).set_index('Ticker')
    print ('(Output df)\n',df,'\n')

# Test the first 7 rows of each group for rolling_mean transgress groups...
df_test = df.groupby('Ticker').head(7).reset_index().set_index('Date')
print ('df_test (summary from df) (Output)\n',df_test,'\n')

显然我的索引现在搞砸了,我不知道这是怎么发生的。

(Output df)
                   Ticker    Open    High     Low  Adj Close  Adj_Close        Date                                                               
0 0 0 2016-05-20  vws.co  443.00  446.30  441.40     442.90     442.90   
      2016-05-23  vws.co  442.00  446.70  439.90     439.90     439.90   
      2016-05-24  vws.co  439.10  450.00  438.10     450.00     450.00   
      2016-05-25  vws.co  455.50  466.10  454.30     464.90     464.90   
      2016-05-26  vws.co  465.00  470.80  464.60     464.60     464.60   
      2016-05-27  vws.co  464.00  480.70  461.20     476.00     476.00   
      2016-05-30  vws.co  477.00  481.80  473.10     475.00     475.00   
      2016-05-31  vws.co  474.00  479.30  472.20     479.00     479.00   
      2016-06-01  vws.co  477.40  480.20  472.90     474.40     474.40   
      2016-05-20    nflx   90.08   93.28   89.98      92.49      92.49   
      2016-05-23    nflx   92.98   95.29   92.85      94.89      94.89   
      2016-05-24    nflx   95.98   99.14   95.75      97.89      97.89   
      2016-05-25    nflx   99.00  100.31   98.30     100.20     100.20   

我需要日期作为索引,在每个股票代码组中排序

谁能帮忙?

【问题讨论】:

  • 不清楚你在问什么。这就是你所追求的吗? df['Difference'] = df.groupby('Ticker')['Adj_Close'].diff()
  • 我需要日期作为索引,在每个股票代码组中排序
  • 输入中的日期是否排序?
  • 是的,日期在 groupby 的输入中排序。

标签: python pandas indexing group-by


【解决方案1】:

好的,我终于找到了解决方案。这条线是我的秘诀

df = df.reset_index(level=0, drop=True)

这个问题帮助我将索引恢复到我想要的状态。 How to get rows in pandas data frame, with maximal values in a column and keep the original index?

下面的代码将使我摆脱不需要的公关。迭代在索引中添加了 col。谢谢大家!

import pandas as pd
from pandas.io.data import DataReader
import numpy as np
import time
import os
from io import StringIO

text = """Date   Ticker        Open        High         Low   Adj_Close   Volume
    2015-04-09  vws.co  315.000000  316.100000  312.500000  311.520000  1686800
    2015-04-10  vws.co  317.000000  319.700000  316.400000  312.700000  1396500
    2015-04-13  vws.co  317.900000  321.500000  315.200000  315.850000  1564500
    2015-04-14  vws.co  320.000000  322.400000  318.700000  314.870000  1370600
    2015-04-15  vws.co  320.000000  321.500000  319.200000  316.150000   945000
    2015-04-16  vws.co  319.000000  320.200000  310.400000  307.870000  2236100
    2015-04-17  vws.co  309.900000  310.000000  302.500000  299.100000  2711900
    2015-04-20  vws.co  303.000000  312.000000  303.000000  306.490000  1629700
    2016-03-31     mmm  166.750000  167.500000  166.500000  166.630005  1762800
    2016-04-01     mmm  165.630005  167.740005  164.789993  167.529999  1993700
    2016-04-04     mmm  167.110001  167.490005  165.919998  166.399994  2022800
    2016-04-05     mmm  165.179993  166.550003  164.649994  165.809998  1610300
    2016-04-06     mmm  165.339996  167.080002  164.839996  166.809998  2092200
    2016-04-07     mmm  165.880005  167.229996  165.250000  167.160004  2721900"""

df = pd.read_csv(StringIO(text), delim_whitespace=1, parse_dates=[0], index_col=0)
runstart = time.time()     # Start script timer

stocklist = ['vws.co','nflx','mmm']#,'msft','tsla']
tickers =   []

def Screener(group):

    def diff_calc(group):

        df['Difference'] = df['Adj_Close'].diff()
        return df['Difference']

    df['Difference'] = diff_calc(group)
    return df

if __name__ == '__main__':

    for ticker in stocklist:
        ### groupby screeener (filtering to only rel ticker group)
        df = df.groupby('Ticker', as_index=False).Adj_Close.apply(Screener) #.reset_index()

        df = df.reset_index(level=0, drop=True)

    print ('(Output df)\n',df,'\n')

    # Test the first 7 rows of each group for rolling_mean transgress groups...
    df_test = df.groupby('Ticker').head(7).reset_index().set_index('Date')
    print ('df_test (summary from df) (Output)\n',df_test,'\n')

【讨论】:

    【解决方案2】:

    GroupBy 对象的diff 方法将日期作为索引:

    # sort if needed (use 'mergesort' algorithm to preserve date order)
    df = df.sort().sort_values('Ticker', kind='mergesort')
    
    df['Difference'] = df.groupby('Ticker')['Adj_Close'].diff()
    

    注意:如果输入没有排序,则需要先排序。上面的代码首先按索引(即日期)排序,然后使用 mergesort 按股票代码来保留日期的顺序。

    输出

    In [21]: df[['Ticker', 'Adj_Close', 'Difference']]
    Out[21]: 
                Ticker   Adj_Close  Difference
    Date                                      
    2015-04-09  vws.co  311.520000         NaN
    2015-04-10  vws.co  312.700000    1.180000
    2015-04-13  vws.co  315.850000    3.150000
    2015-04-14  vws.co  314.870000   -0.980000
    2015-04-15  vws.co  316.150000    1.280000
    2015-04-16  vws.co  307.870000   -8.280000
    2015-04-17  vws.co  299.100000   -8.770000
    2015-04-20  vws.co  306.490000    7.390000
    2016-03-31     mmm  166.630005         NaN
    2016-04-01     mmm  167.529999    0.899994
    2016-04-04     mmm  166.399994   -1.130005
    2016-04-05     mmm  165.809998   -0.589996
    2016-04-06     mmm  166.809998    1.000000
    2016-04-07     mmm  167.160004    0.350006
    

    【讨论】:

    • 感谢 IanS,但我需要在每个股票代码组下排序日期。我认为你的回复没有做到这一点。对吗?
    • 除非我误解了你,否则我认为确实如此。或者你想要一个多索引,Ticker 为 0 级,日期为 1 级?
    • @Excaliburst 你可能没有收到我之前评论的通知。你为什么不直接运行我的代码,看看它是否适合你?它适用于您提供的示例。
    • 感谢 IanS,但我做到了 - 我收到一个错误:AttributeError: 'DataFrame' object has no attribute 'sort_values'。不,我不需要多索引,就像您的示例显示的那样,在 df 中添加了“差异”col。
    • 嗯,您必须使用旧版本的 pandas。如果不能升级,可以试试df = df.sort().sort('Ticker', kind='mergesort')。请参阅(已弃用)documentation
    【解决方案3】:

    groupby 函数会自动将您分组的列设置为索引。为避免这种情况,请在您的 groupby 中选择:as_index = False

    df_test = df.groupby('Ticker', as_index = False).head(7).reset_index().set_index('Date')
    

    【讨论】:

    • 正如您从我的第一个 groupby (迭代)中看到的那样,我已经做了 as_index=False。但它不能按预期工作。我得到一个空索引 col pr 代码迭代...
    • ysearka - 任何想法。我仍然得到新的索引 col pr。股票代码迭代。即使在 groupby 中使用 'as_index = False'...!
    猜你喜欢
    • 1970-01-01
    • 2016-02-06
    • 1970-01-01
    • 2018-09-22
    • 1970-01-01
    • 2020-01-27
    • 2018-08-29
    • 2018-04-17
    • 2014-02-02
    相关资源
    最近更新 更多