使用 Pandas 创建 NumPy 数组答案

【问题标题】：Creating NumPy array with Pandas使用 Pandas 创建 NumPy 数组
【发布时间】：2015-12-16 21:33:24
【问题描述】：

我正在尝试将 scikit 与一个电子表格 (.xlsx) 中的一些数据一起使用。为了实现这一点，我使用 Pandas 来读取电子表格，然后我打算使用 numpy 来使用 scikit。

这里的问题是，当我将我的 DF 结构转换为 numpy 时，我几乎丢失了所有数据！我认为这是因为它没有列名。只有原始数据。例如：

28.7967 16.0021 2.6449 0.3918 0.1982

31.6036 11.7235 2.5185 0.5303 0.3773

162.052 136.031 4.0612 0.0374 0.0187

到目前为止我的代码：

def split_data():
    test_data = pd.read_excel('magic04.xlsx', sheetname=0, skip_footer=16020)
    #code below prints correctly the data
    print test_data.iloc[:, 0:10] 

    #none of the code below work as expected 
    test1 = np.array(test_data.iloc[:, 0:10])
    test2 = test_data.as_matrix()

我真的迷路了。非常欢迎任何帮助...

【问题讨论】：

标签： python arrays numpy pandas scikit-learn

【解决方案1】：

我建议您在read_excel 中使用header=None。请参阅以下内容：

df = pd.read_excel('stuff.xlsx')
>> df
    28.7967 16.0021 2.6449  0.3918  0.1982
0   31.6036 11.7235 2.5185  0.5303  0.3773
1   162.0520    136.0310    4.0612  0.0374  0.0187

>> df.ix[:, 1: 2]

0
1

对比：

df = pd.read_excel('stuff.xlsx', header=None)
>> df

0   1   2   3   4
0   28.7967 16.0021 2.6449  0.3918  0.1982
1   31.6036 11.7235 2.5185  0.5303  0.3773
2   162.0520    136.0310    4.0612  0.0374  0.0187

>> df.ix[:, 1: 2]
    1   2
0   16.0021 2.6449
1   11.7235 2.5185
2   136.0310    4.0612

【讨论】：

成功了！！！它适用于两种方式：属性“.iloc[:, 0:X]”和方法“as_matrix()”！真的谢谢！