【问题标题】:Time series with appending issue in pandas data frame在熊猫数据框中附加问题的时间序列
【发布时间】:2016-06-01 21:49:09
【问题描述】:

我正在研究时间序列,我发现 pandas 数据框中的行为非常奇特

以下代码在索引不是时间序列时有效

import pandas as pd
df = pd.DataFrame({"a":[1,2,3], "b":[31,41,51],"c":[31,52,23]}, index=["z", "y", "x"])
df1 = pd.DataFrame({"a":[41,55,16]}, index=["w", "v", "u"])
df2 = pd.DataFrame({"b":[24,3,57]}, index=["w", "v", "u"])
df3 = pd.DataFrame({"c":[111,153,123]}, index=["w", "v", "u"]) 
df = df.append(df1)
dfx.ix[df2.index, "b"] = df2

df 的输出:

    a   b   c
z   1  31  31
y   2  41  52
x   3  51  23
w  41  24 NaN
v  55   3 NaN
u  16  57 NaN

但是,当有datetime64[ns] 索引或大小太大时,这不起作用

当有datetime64[ns]索引时,除了以下命令有效

df = df.append(df1)
df["b"][df2.index] = df2

为什么会这样?

【问题讨论】:

    标签: python python-2.7 pandas dataframe time-series


    【解决方案1】:

    你可以试试fillna:

    df = df.append(df1)
    print df.fillna(df2)
        a   b   c
    z   1  31  31
    y   2  41  52
    x   3  51  23
    w  41  24 NaN
    v  55   3 NaN
    u  16  57 NaN
    

    我用Datatimeindex 对其进行了测试,效果很好:

    import pandas as pd
    
    df = pd.DataFrame({"a":[1,2,3], "b":[31,41,51],"c":[31,52,23]}, index=["z", "y", "x"])
    df.index = pd.date_range('20160101',periods=3,freq='T')
    
    df1 = pd.DataFrame({"a":[41,55,16]}, index=["w", "v", "u"])
    df1.index = pd.date_range('20160104',periods=3,freq='T')
    
    df2 = pd.DataFrame({"b":[24,3,57]}, index=["w", "v", "u"])
    df2.index = pd.date_range('20160104',periods=3,freq='T')
    
    df3 = pd.DataFrame({"c":[111,153,123]}, index=["w", "v", "u"])
    df3.index = pd.date_range('20160104',periods=3,freq='T')
    
    df = df.append(df1)
    print df
                          a   b   c
    2016-01-01 00:00:00   1  31  31
    2016-01-01 00:01:00   2  41  52
    2016-01-01 00:02:00   3  51  23
    2016-01-04 00:00:00  41 NaN NaN
    2016-01-04 00:01:00  55 NaN NaN
    2016-01-04 00:02:00  16 NaN NaN
    
    print df.fillna(df2)
                          a   b   c
    2016-01-01 00:00:00   1  31  31
    2016-01-01 00:01:00   2  41  52
    2016-01-01 00:02:00   3  51  23
    2016-01-04 00:00:00  41  24 NaN
    2016-01-04 00:01:00  55   3 NaN
    2016-01-04 00:02:00  16  57 NaN
    
    df.ix[df2.index, "b"] = df2
    print df
                          a   b   c
    2016-01-01 00:00:00   1  31  31
    2016-01-01 00:01:00   2  41  52
    2016-01-01 00:02:00   3  51  23
    2016-01-04 00:00:00  41  24 NaN
    2016-01-04 00:01:00  55   3 NaN
    2016-01-04 00:02:00  16  57 NaN
    

    【讨论】:

    • 它与pandas: 0.17.1 配合得很好。 pandas 的版本是什么?检查它print pd.show_versions()。下一个问题可能是您的内存太小而 df 太大。
    猜你喜欢
    • 1970-01-01
    • 2021-07-28
    • 2018-03-24
    • 2020-11-21
    • 2014-01-03
    • 1970-01-01
    • 2020-05-21
    • 1970-01-01
    • 2019-09-12
    相关资源
    最近更新 更多