【问题标题】:How can I do computations on dataframes or series that have different indexes in PANDAS?如何对 PANDAS 中具有不同索引的数据帧或系列进行计算?
【发布时间】:2016-06-17 00:53:12
【问题描述】:

我有两个长度和数据类型相同的系列。两者都是float64。唯一的区别是索引都是日期,但一个日期是在月初,另一个是在月底。如何对具有不同索引的系列或数据帧进行相关性或协方差等计算?

import numpy as np
from pandas import Series, DataFrame
import pandas as pd
import Quandl

IPO=Quandl.get("RITTER/US_IPO_STATS", authtoken="api key")
ir=Quandl.get("FRBC/REALRT", authtoken="api key")

ipo_splice=IPO[264:662]
new_ipo=ipo_splice['Gross Number of IPOs'];
new_ipo=new_ipo.T


ir_splice=ir[0:398]
new_ir=ir_splice['RR 1 Month']
new_ir=new_ir.T

new_ipo.corr(new_ir)

【问题讨论】:

    标签: python pandas dataframe series quandl


    【解决方案1】:

    reset_index(drop=True) 用于您想要关联的内容然后连接。

    s1 = pd.DataFrame(np.random.rand(10), list('abcdefghij'), columns=['s1'])
    s2 = pd.DataFrame(np.random.rand(10), list('ABCDEFGHIJ'), columns=['s2'])
    
    print pd.concat([s.reset_index(drop=True) for s in [s1, s2]], axis=1).corr()
    
    
              s1        s2
    s1  1.000000 -0.437945
    s2 -0.437945  1.000000
    

    【讨论】:

      【解决方案2】:

      您可以使用resample() 函数对您的一个索引进行重新采样(我们的目标是同时拥有两个索引 BoM 或 EoM):

      数据:

      In [63]: df_bom
      Out[63]:
                  val
      2015-01-01   76
      2015-02-01   27
      2015-03-01   65
      2015-04-01   71
      2015-05-01    9
      2015-06-01   23
      2015-07-01   52
      2015-08-01   10
      2015-09-01   62
      2015-10-01   25
      
      In [64]: df_eom
      Out[64]:
                  val
      2015-01-31   87
      2015-02-28   16
      2015-03-31   85
      2015-04-30    4
      2015-05-31   37
      2015-06-30   63
      2015-07-31    3
      2015-08-31   73
      2015-09-30   81
      2015-10-31   69
      

      解决方案:

      In [61]: df_eom.resample('MS') + df_bom
      C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
      use .resample(...).mean() instead of .resample(...)
      Out[61]:
                  val
      2015-01-01  163
      2015-02-01   43
      2015-03-01  150
      2015-04-01   75
      2015-05-01   46
      2015-06-01   86
      2015-07-01   55
      2015-08-01   83
      2015-09-01  143
      2015-10-01   94
      
      In [62]: df_eom.resample('MS').join(df_bom, lsuffix='_lft')
      C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
      use .resample(...).mean() instead of .resample(...)
      Out[62]:
                  val_lft  val
      2015-01-01       87   76
      2015-02-01       16   27
      2015-03-01       85   65
      2015-04-01        4   71
      2015-05-01       37    9
      2015-06-01       63   23
      2015-07-01        3   52
      2015-08-01       73   10
      2015-09-01       81   62
      2015-10-01       69   25
      

      替代方法 - 通过yearmonth 部分合并DF:

      In [69]: %paste
      (pd.merge(df_bom, df_eom,
                left_on=[df_bom.index.year, df_bom.index.month],
                right_on=[df_eom.index.year, df_eom.index.month],
                suffixes=('_bom','_eom')))
      ## -- End pasted text --
      Out[69]:
         key_0  key_1  val_bom  val_eom
      0   2015      1       76       87
      1   2015      2       27       16
      2   2015      3       65       85
      3   2015      4       71        4
      4   2015      5        9       37
      5   2015      6       23       63
      6   2015      7       52        3
      7   2015      8       10       73
      8   2015      9       62       81
      9   2015     10       25       69
      

      设置:

      In [59]: df_bom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='MS'))
      
      In [60]: df_eom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='M'))
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2020-10-11
        • 2015-05-21
        • 2017-02-06
        • 2013-03-16
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多