如何转换 numpy 子数组的 dtype？答案

【问题标题】：How to convert dtype of numpy subarray?如何转换 numpy 子数组的 dtype？
【发布时间】：2020-03-19 01:46:59
【问题描述】：

我正在尝试将 csv 文件中的数据读取到 numpy 数组中。由于 csv 文件包含空字段，我将所有数据读入dtype=str 的数组，并计划将行/列转换为适当的数字类型。下面的例子是我在转换这些数组 dtypes 时失败的。

import numpy as np

x = np.array([
['name', 'property', 'value t0', 'value t1', 'value t2'],
['a', 0.5, 1, 2, 3],
['b', 0.2, 5, 10, 100],
['c', 0.7, 3, 6, 9],
], dtype=str)

首先，我们查看原始数组。

# print("\n .. x (shape={}, dtype={}):\n{}\n".format(x.shape, x.dtype, x))

[['name' 'property' 'value t0' 'value t1' 'value t2']
 ['a' '0.5' '1' '2' '3']
 ['b' '0.2' '5' '10' '100']
 ['c' '0.7' '3' '6' '9']]

然后，让我们确保数字条目（从第一行向下和右侧第二列获取）可以转换为type <int>。

# print(x[1:, 2:].astype(int))

[[  1   2   3]
 [  5  10 100]
 [  3   6   9]]

所以，我尝试将这些概念放在一起。

# # x[1:, 2:] = x[1:, 2:].astype(int)
# x[1:, 2:] = np.array(x[1:, 2:], dtype=int)

print(x)

[['name' 'property' 'value t0' 'value t1' 'value t2']
 ['a' '0.5' '1' '2' '3']
 ['b' '0.2' '5' '10' '100']
 ['c' '0.7' '3' '6' '9']]

为什么所选条目仍然是字符串？我看到发布了类似的问题，接受的解决方案似乎是使用命名字段。但是，对于我的用例，我更喜欢数字索引而不是命名字段。

【问题讨论】：

您不能将不同的数据类型应用于数组的不同部分。看起来您可能应该使用 Pandas 之类的东西，而不是直接使用 NumPy。
看看Structured Array
命名字段，结构化数组，方法将允许[('name','U1'),('property',float), ...] dtype。另一种选择是 object dtype，其中元素以类似列表的方式存储。否则你不能混合dtypes。 pandas 数据框也将具有命名列，并且每列都有一个单独的系列。

标签： python-3.x numpy multidimensional-array type-conversion dtype

【解决方案1】：

In [83]: alist = [ 
    ...: ['name', 'property', 'value t0', 'value t1', 'value t2'], 
    ...: ['a', 0.5, 1, 2, 3], 
    ...: ['b', 0.2, 5, 10, 100], 
    ...: ['c', 0.7, 3, 6, 9], 
    ...: ]                                                                                                           
In [84]: alist                                                                                                       
Out[84]: 
[['name', 'property', 'value t0', 'value t1', 'value t2'],
 ['a', 0.5, 1, 2, 3],
 ['b', 0.2, 5, 10, 100],
 ['c', 0.7, 3, 6, 9]]
In [85]: np.array(alist)                                                                                             
Out[85]: 
array([['name', 'property', 'value t0', 'value t1', 'value t2'],
       ['a', '0.5', '1', '2', '3'],
       ['b', '0.2', '5', '10', '100'],
       ['c', '0.7', '3', '6', '9']], dtype='<U8')

对象数组：

In [87]: np.array(alist, dtype=object)                                                                               
Out[87]: 
array([['name', 'property', 'value t0', 'value t1', 'value t2'],
       ['a', 0.5, 1, 2, 3],
       ['b', 0.2, 5, 10, 100],
       ['c', 0.7, 3, 6, 9]], dtype=object)

结构化数组：

In [88]: np.array([tuple(row) for row in alist[1:]], dtype='U1,f,i,i,i')                                             
Out[88]: 
array([('a', 0.5, 1,  2,   3), ('b', 0.2, 5, 10, 100),
       ('c', 0.7, 3,  6,   9)],
      dtype=[('f0', '<U1'), ('f1', '<f4'), ('f2', '<i4'), ('f3', '<i4'), ('f4', '<i4')])

熊猫：

In [90]: import pandas as pd                                                                                         
In [91]: pd.DataFrame(alist[1:], columns=alist[0])                                                                   
Out[91]: 
  name  property  value t0  value t1  value t2
0    a       0.5         1         2         3
1    b       0.2         5        10       100
2    c       0.7         3         6         9

【讨论】：