具有足够元素的 1d numpy 数组未按预期调整大小答案

【问题标题】：1d numpy array with enough elements not resizing as expected具有足够元素的 1d numpy 数组未按预期调整大小
【发布时间】：2023-12-01 04:03:01
【问题描述】：

我刚开始使用 numpy 数组和 panda 数据帧，我正在做一个练习项目，但遇到了一些问题。我有一个熊猫数据框，我将它的行传递给一个函数来做一些工作。该函数接受两个不同的数组，一个标记为最佳和最差，然后创建一个新向量来比较总和。从那里它将返回 pandas.apply 已传递的当前数组，或者它将返回基于 sum() 最低的新向量。这将创建一个新的 python 数组，该数组最后需要是 20x5 的矩阵。该函数工作正常，但返回的数据帧需要转换为大小为 (20 x 5) 的 python 数组以供进一步工作，当调用 np.array() 时，它将其转换为大小为 (20,) 的数组.我认为只使用 .reshape(20,5) 会起作用，因为它有足够的元素可以使用，但它没有，它只是在运行时失败。感谢任何帮助，因为我找不到任何可以帮助我理解为什么会发生这种情况的东西。

（错误，正如许多人通过阅读上面所猜到的那样：“无法将大小为 20 的数组重新整形为形状 (20,5)”）

我的程序中显示它的代码除外（可以自己运行）：

import numpy as np
import pandas as pd

rng = np.random.default_rng(seed=22)
df = pd.DataFrame(rng.random((20,5)))

def new_vectors(current, best, worst):
    #convert current to numpy array 
    current = current.to_numpy()

    #construct a new vector to check
    new = np.add(current, np.subtract((rng.random()*(np.subtract(best, np.absolute(current)))), ((rng.random()*(np.subtract(worst, np.absolute(current)))))))

    #get the new sum for the new and old vectors
    summed = current.sum()
    newsummed = new.sum()

    #return the smallest one
    return np.add(((newsummed < summed)*(new)), ((newsummed > summed)*(current))).flatten()


z = np.array(df.apply(new_vectors, args=(df.iloc[0].to_numpy(), df.iloc[11].to_numpy()), axis=1))
z.reshape(20,5) #I know reshape() creates a copy, just here to show it doesn't work regardless

【问题讨论】：

z.reshape(20,5) 返回一个新数组。它不能就地工作。阅读文档：numpy.org/doc/stable/reference/generated/numpy.reshape.html
您返回一个形状为(20,) 的array of arrays。您无法将 (20,) 重塑为 (20,5)
对 hpaulj，我知道它没有，我只是把它放在那里表明它根本不起作用：也就是它永远不会超过那条线。
致 Michael，有没有办法将数据转换为不同的形状，或者创建一个大小为 20x5 的不同数组，并将这 100 个数据块映射到它？
您能否提供一个仅使用(3,2) 输入数组的结果示例？

标签： python arrays pandas numpy reshape

【解决方案1】：

您的原始数据框 - 缩短了用于显示目的的长度：

In [628]: df = pd.DataFrame(rng.random((4,5)))
In [629]: df
Out[629]: 
          0         1         2         3         4
0  0.891169  0.134904  0.515261  0.975586  0.150426
1  0.834185  0.671914  0.072134  0.170696  0.923737
2  0.065445  0.356001  0.034787  0.257711  0.213964
3  0.790341  0.080620  0.111369  0.542423  0.199517

下一帧：

In [631]: df1=df.apply(new_vectors, args=(df.iloc[0].to_numpy(), df.iloc[3].to_numpy()), axis=1)
In [632]: df1
Out[632]: 
0    [0.891168725430691, 0.13490384333565053, 0.515...
1    [0.834184861872087, 0.6719141503303373, 0.0721...
2    [0.065444520313796, 0.35600115939269394, 0.034...
3    [0.7903408924058509, 0.08061955595765169, 0.11...
dtype: object

请注意，它有 1 列，其中包含数组。从中创建一个数组：

In [633]: df1.to_numpy()
Out[633]: 
array([array([0.89116873, 0.13490384, 0.51526113, 0.97558562, 0.15042584]),
       array([0.83418486, 0.67191415, 0.07213404, 0.17069617, 0.92373724]),
       array([0.06544452, 0.35600116, 0.03478695, 0.25771129, 0.21396367]),
       array([0.79034089, 0.08061956, 0.1113691 , 0.54242262, 0.19951741])],
      dtype=object)

即 (4,) object dtype。 dtype 很重要。即使这些元素本身都有 5 个元素，reshape 也不能跨越那个“对象”边界。我们无法将其重塑为 (4,5)。

但是我们可以concatenate那些数组：

In [636]: np.vstack(df1.to_numpy())
Out[636]: 
array([[0.89116873, 0.13490384, 0.51526113, 0.97558562, 0.15042584],
       [0.83418486, 0.67191415, 0.07213404, 0.17069617, 0.92373724],
       [0.06544452, 0.35600116, 0.03478695, 0.25771129, 0.21396367],
       [0.79034089, 0.08061956, 0.1113691 , 0.54242262, 0.19951741]])

【讨论】：

非常感谢。以前的答案肯定会起作用，但感谢您花时间解释为什么无法重塑 numpy 数组，所以我将来会更好地理解这种行为。

【解决方案2】：

您可以手动进行整形。

删除z.reshape(20,5)。这不适用于数组数组。

应用函数后，改用这个：

# Create a empty matrix with desired size
 matrix = np.zeros(shape=(20,5))
 # Iterate over z and assign each array to a row in the numpy matrix.
 for i,arr in enumerate(z):
      matrix[i] = arr

如果您不知道所需的矩阵大小。将矩阵创建为matrix = np.zeros(shape=df.shape)。

所有用到的代码：

import numpy as np
import pandas as pd

rng = np.random.default_rng(seed=22)
df = pd.DataFrame(rng.random((20,5)))

def new_vectors(current, best, worst):
    #convert current to numpy array 
    current = current.to_numpy()

    #construct a new vector to check
    new = np.add(current, np.subtract((rng.random()*(np.subtract(best, np.absolute(current)))), ((rng.random()*(np.subtract(worst, np.absolute(current)))))))

    #get the new sum for the new and old vectors
    summed = current.sum()
    newsummed = new.sum()

    #return the smallest one
    return np.add(((newsummed < summed)*(new)), ((newsummed > summed)*(current))).flatten()


z = np.array(df.apply(new_vectors, args=(df.iloc[0].to_numpy(), df.iloc[11].to_numpy()), axis=1))

matrix = np.zeros(shape=df.shape)

for i,arr in enumerate(z):
     matrix[i] = arr

【讨论】：

我认为这是我需要做的，但我只是想确保没有一个 numpy 特定的方法可以减少我应该用来解决这个问题的冗长。不管怎样，谢谢！