来自 MultiIndex 和 NumPy 结构化数组 (recarray) 的 Pandas DataFrame答案

【问题标题】：Pandas DataFrame from MultiIndex and NumPy structured array (recarray)来自 MultiIndex 和 NumPy 结构化数组 (recarray) 的 Pandas DataFrame
【发布时间】：2016-10-10 11:41:03
【问题描述】：

首先我创建一个两级MultiIndex：

import numpy as np
import pandas as pd

ind = pd.MultiIndex.from_product([('X','Y'), ('a','b')])

我可以这样使用它：

pd.DataFrame(np.zeros((3,4)), columns=ind)

这给出了：

     X         Y     
     a    b    a    b
0  0.0  0.0  0.0  0.0
1  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  0.0

但现在我正在尝试这样做：

dtype = [('Xa','f8'), ('Xb','i4'), ('Ya','f8'), ('Yb','i4')]
pd.DataFrame(np.zeros(3, dtype), columns=ind)

但这给出了：

Empty DataFrame
Columns: [(X, a), (X, b), (Y, a), (Y, b)]
Index: []

我期望与之前的结果类似，有三行。

也许更一般地说，我想做的是生成一个带有 MultiIndex 列的 Pandas DataFrame，其中列具有不同的类型（如示例中，a 是浮点数，但 b 是整数）。

【问题讨论】：

标签： python numpy pandas dataframe multi-index

【解决方案1】：

这看起来像一个错误，值得报告as an issue github。

一种解决方法是在构建后手动设置列：

In [11]: df1 = pd.DataFrame(np.zeros(3, dtype))

In [12]: df1.columns = ind

In [13]: df1
Out[13]:
     X       Y
     a  b    a  b
0  0.0  0  0.0  0
1  0.0  0  0.0  0
2  0.0  0  0.0  0

【讨论】：

谢谢。我已将其报告为错误：github.com/pydata/pandas/issues/13415，您的解决方法确实有效。

【解决方案2】：

pd.DataFrame(np.zeros(3, dtype), columns=ind)

Empty DataFrame
Columns: [(X, a), (X, b), (Y, a), (Y, b)]
Index: []

只是显示数据框输出的文本表示。

Columns: [(X, a), (X, b), (Y, a), (Y, b)]

然后只是索引的文本表示。

如果你改为：

df = pd.DataFrame(np.zeros(3, dtype), columns=ind)

print type(df.columns)

<class 'pandas.indexes.multi.MultiIndex'>

你看确实是pd.MultiIndex

说了这么多。我不明白为什么在数据框构造函数中指定索引会删除值。

解决方法是这样。

df = pd.DataFrame(np.zeros(3, dtype))

df.columns = ind

print df

     X       Y   
     a  b    a  b
0  0.0  0  0.0  0
1  0.0  0  0.0  0
2  0.0  0  0.0  0

【讨论】：