【问题标题】:Create MultiIndex DataFrame from a Dict of Series of Numpy Array从一系列 Numpy 数组的字典创建 MultiIndex DataFrame
【发布时间】:2021-03-08 21:14:07
【问题描述】:

给定一个包含pandas.Series 的字典,每个单元格中都有numpy.array

import pandas as pd
import numpy as np

N = 5
foo = [x for x in np.random.randint(10, size=(N,8))]        # list of ndarray
bar = [x for x in np.random.randint(10, size=(N,8))]        # list of ndarray
baz = [x for x in np.random.randint(10, size=(N,8))]        # list of ndarray
input = {
    'foo': pd.Series(foo, index=pd.date_range('2020-01-01', periods=N, freq='D')),
    'bar': pd.Series(bar, index=pd.date_range('2020-01-01', periods=N, freq='D')),
    'baz': pd.Series(baz, index=pd.date_range('2020-01-01', periods=N, freq='D')),
}
print(input)
# {'foo': 
# 2020-01-01    [4, 1, 3, 3, 4, 6, 0, 2]
# 2020-01-02    [7, 7, 1, 2, 1, 2, 8, 6]
# 2020-01-03    [1, 0, 6, 8, 1, 8, 2, 3]
# 2020-01-04    [1, 5, 6, 0, 1, 8, 8, 4]
# 2020-01-05    [4, 7, 9, 3, 5, 3, 0, 1]
# Freq: D, dtype: object, 
# 'bar': 
# 2020-01-01    [0, 2, 2, 5, 4, 9, 7, 9]
# 2020-01-02    [7, 0, 8, 0, 7, 8, 8, 9]
# 2020-01-03    [6, 7, 2, 7, 2, 9, 8, 7]
# 2020-01-04    [1, 8, 8, 9, 6, 1, 4, 6]
# 2020-01-05    [9, 4, 4, 2, 6, 2, 7, 7]
# Freq: D, dtype: object, 
# 'baz': 
# 2020-01-01    [9, 2, 9, 2, 5, 3, 5, 3]
# 2020-01-02    [6, 5, 3, 3, 9, 7, 7, 9]
# 2020-01-03    [5, 7, 0, 6, 1, 5, 6, 7]
# 2020-01-04    [3, 9, 2, 6, 1, 5, 9, 9]
# 2020-01-05    [2, 7, 6, 4, 1, 2, 9, 2]
# Freq: D, dtype: object}

将其转换为 MultiIndex pandas DataFrame 的最有效方法是什么,其中字典键位于第一个多索引级别,而系列的 DateTimeIndex 位于第二个多索引级别?

使用上面给出的示例,所需的 pandas DataFrame 将有 15 行和 8 列

【问题讨论】:

    标签: python pandas numpy dataframe time-series


    【解决方案1】:

    使用随机时,请使用seed,这样您的数据是可重现的。

    你可以使用 pandas concat,结合 numpy 的 vstack 来获得你想要的输出:

    np.random.seed(5)
    N = 5
    foo = [x for x in np.random.randint(10, size=(N, 8))]  # list of ndarray
    bar = [x for x in np.random.randint(10, size=(N, 8))]  # list of ndarray
    baz = [x for x in np.random.randint(10, size=(N, 8))]  # list of ndarray
    data = {
        "foo": pd.Series(foo, index=pd.date_range("2020-01-01", periods=N, freq="D")),
        "bar": pd.Series(bar, index=pd.date_range("2020-01-01", periods=N, freq="D")),
        "baz": pd.Series(baz, index=pd.date_range("2020-01-01", periods=N, freq="D")),
    }
    
    box = pd.concat(data)
    pd.DataFrame(np.vstack(box), index=box.index)
    
    
                    0   1   2   3   4   5   6   7
    foo 2020-01-01  3   6   6   0   9   8   4   7
        2020-01-02  0   0   7   1   5   7   0   1
        2020-01-03  4   6   2   9   9   9   9   1
        2020-01-04  2   7   0   5   0   0   4   4
        2020-01-05  9   3   2   4   6   9   3   3
    bar 2020-01-01  2   1   5   7   4   3   1   7
        2020-01-02  3   1   9   5   7   0   9   6
        2020-01-03  0   5   2   8   6   8   0   5
        2020-01-04  2   0   7   7   6   0   0   8
        2020-01-05  5   5   9   6   4   5   2   8
    baz 2020-01-01  8   1   6   3   4   1   8   0
        2020-01-02  2   2   4   1   6   3   4   3
        2020-01-03  1   4   2   3   4   9   4   0
        2020-01-04  6   6   9   2   9   3   0   8
        2020-01-05  8   9   7   4   8   6   8   0
    

    【讨论】:

      【解决方案2】:

      一个简单的方法是充分利用 pandas:stack()to_frame()swaplevel() 的魔力

      df = pd.DataFrame(inputs).stack().to_frame().swaplevel()
      df.iloc[:,0].apply(lambda x: pd.Series({idx: value for idx, value in enumerate(x)}))
      

      产生:

                      0   1   2   3   4   5   6   7
      foo 2020-01-01  2   3   5   1   7   0   8   2
      bar 2020-01-01  8   1   4   6   1   7   3   1
      baz 2020-01-01  7   3   4   3   9   0   5   0
      foo 2020-01-02  8   3   8   1   6   5   5   4
      bar 2020-01-02  2   1   9   5   6   6   1   4
      baz 2020-01-02  4   3   3   8   7   4   2   4
      foo 2020-01-03  8   8   5   2   9   4   1   1
      bar 2020-01-03  0   0   0   8   8   5   8   5
      baz 2020-01-03  1   5   5   9   5   2   2   7
      foo 2020-01-04  2   7   6   3   0   8   2   5
      bar 2020-01-04  1   8   0   3   1   5   1   3
      baz 2020-01-04  5   0   7   6   1   7   7   9
      foo 2020-01-05  9   0   8   5   9   9   6   8
      bar 2020-01-05  0   3   1   6   4   1   9   6
      baz 2020-01-05  4   6   6   7   9   3   0   5
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2019-06-13
        • 2020-01-18
        • 1970-01-01
        • 2013-11-28
        • 1970-01-01
        • 2020-08-31
        • 2021-01-29
        • 2021-02-20
        相关资源
        最近更新 更多