得到一个奇怪的错误，上面写着“使用 array.reshape(-1, 1) 重塑你的数据”[重复]答案

【问题标题】：Getting a weird error that says 'Reshape your data either using array.reshape(-1, 1)' [duplicate]得到一个奇怪的错误，上面写着“使用 array.reshape(-1, 1) 重塑你的数据”[重复]
【发布时间】：2020-04-21 15:19:12
【问题描述】：

我正在测试这段代码。

# Import the necessary packages
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Normalizer
from sklearn.cluster import KMeans
# Define a normalizer
normalizer = Normalizer()
# Create Kmeans model
kmeans = KMeans(n_clusters = 10,max_iter = 1000)
# Make a pipeline chaining normalizer and kmeans
pipeline = make_pipeline(normalizer,kmeans)
# Fit pipeline to daily stock movements
pipeline.fit(score)
labels = pipeline.predict(score)

这行抛出一个错误：

pipeline.fit(score)

这是我看到的错误：

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

我不知道这个错误是什么意思。我用谷歌搜索并没有发现任何有用的东西。这是我的数据的一个小样本：

array=[1. 1. 1. ... 8. 1. 1.].

我正在按照下面链接中的示例进行操作。

https://medium.com/datadriveninvestor/stock-market-clustering-with-k-means-clustering-in-python-4bf6bd5bd685

当我从链接运行代码时，一切正常。我不确定为什么在我自己的数据上运行代码时它会掉下来，这只是：

1, 1.9, 2.62, 3.5, 4.1, 7.7, 9.75, etc, etc.

从 1 到 10。就是这样。

【问题讨论】：

就像它说的那样重塑它。 numpy 需要为某些进程定义两个定义的维度。您可以使用array.shape 检查形状。你的可能是 (n,)，但它必须是 (n,1)。试试array = array.reshape(-1, 1)
是的，这可行，但实际问题是什么？我以前没见过。
我认为这与定义的维度有关。 reshape 中的 -1 充当“通配符”并告诉numpy 找出答案。由于矩阵是（行，列）格式，我们告诉 reshape 以确保有 1 列和未知数量的行。现在，如果你有 array=np.array([1., 1., 1., 8., 1., 1.,])，它有 6 个元素长，然后运行 array.reshape(-1, 2)，numpy 将自动输出一个尺寸为 (3, 2) 的矩阵。我们的列要求 (2) 得到满足，numpy 解决了剩下的问题。注意：您不能运行array.reshape(-1, -1)，因为必须知道一维。
这能回答你的问题吗？ Preprocessing in scikit learn - single sample - Depreciation warning

标签： python python-3.x scikit-learn k-means

【解决方案1】：

任何sklearn.Transformer 都需要一个[sample size, n_features] 大小的数组。因此，您必须在两种情况下重塑数据，

如果您只有一个样本，则需要将其重塑为 [1, n_features] 大小的数组
如果您只有一个特征，则需要将其重塑为 [sample size, 1] 大小的数组

所以你需要做适合问题的事情。您正在传递一维向量。

[1. 1. 1. ... 8. 1. 1.]

如果这是单个样本，请将其重新整形为 (1, -1) 大小的数组，你会没事的。但话虽如此，您可能需要考虑以下问题。

如果这是单个样本，则用单个样本拟合模型是没有意义的。你不会得到任何好处。
如果这是一组具有单一特征的样本，我认为在这样的数据集上执行 K-means 并没有什么好处。

【讨论】：

【解决方案2】：

问题可能出在您的数据格式上。大多数模型都需要一个数据框

【讨论】：