在 sklearn 中使用拟合进行协方差估计时出错答案

【问题标题】：Error using fit in sklearn for covariance estimation在 sklearn 中使用拟合进行协方差估计时出错
【发布时间】：2017-03-20 01:39:59
【问题描述】：

我有一个变量 xcin 包含数组形式的数据。我正在尝试使用 GraphLassoCV 中的 fit() 来拟合这些数据。

xcin 中存在的数据：

[ 0.722    0.32202  0.70102  0.7414   0.18204  0.01132  0.171    0.723
  0.722    0.52605  0.70102  0.7414   0.29253  0.95     0.729    0.7414
  0.74999  0.7412   0.454    0.7414   0.15122  0.7414   0.65992  0.723
  0.70102  0.45209  0.521    0.7412   0.92412  0.01403  0.45203  0.723
  0.9303   0.454    0.74999  0.5232   0.6309   0.1712   0.7414   0.221
  0.70102  0.851    0.241    0.01122  0.749    0.749    0.24232  0.454
  0.80904  0.454    0.40106  0.74999  0.74999  0.17123  0.74999  0.7412
  0.271    0.7414   0.55204  0.7414   0.5259   0.7414   0.749    0.7414
  0.722    0.28133  0.9219   0.749    0.729    0.749    0.3311   0.45201
  0.9303   0.45201  0.722    0.6304   0.722    0.40106  0.45205  0.18109
  0.722    0.749    0.749    0.5259   0.40107  0.40106  0.36911  0.7414
  0.7412   0.74999  0.154    0.851    0.722    0.154    0.722    0.74999
  0.29253  0.729    0.7412   0.6309 ]

我尝试使用以下代码：

xcin =np.array([df['xcin']])/100000.0
# Learn a graphical structure from the correlations
edge_model = covariance.GraphLassoCV()
X = xcin.copy().T
X /= X.std(axis=0)
edge_model.fit(X)

但我在 edge_model.fit() 行上遇到错误：

ValueError: Found array with 1 feature(s) (shape=(100, 1)) while a minimum of 2 is required by GraphLassoCV.

谁能解释一下如何解决这个问题。

我正在尝试按照此处演示的方法 (http://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html#sphx-glr-auto-examples-applications-plot-stock-market-py) 生成类似类型的可视化。

【问题讨论】：

sklearn 会告诉您问题所在。输入的形状错误（100 个样本有 1 个特征；sklearn 需要 >= 2 个特征）。但我不明白你在这里的数据和算法的组合，所以在有人可以提供帮助之前添加更多信息是个好主意。
我正在尝试根据行业代码进行聚类。这 100 个数据点对应于存在的不同类型的行业。我想使用亲和力传播，因为从一开始就不知道集群的数量。在这种情况下我该怎么办

标签： python machine-learning scikit-learn curve-fitting

【解决方案1】：

使用sklearn.preporcessing.polynomialfeatures（2）.fit_transform（x）制作更多功能

【讨论】：

请更多地详细说明您给出的解决方案。使用格式为您的源代码行。考虑读取stackoverflow.com/help/how-to-answer span>

【解决方案2】：

您的数据是 100x1，这意味着您有 100 个数字。所以这是 100 个 1dim 数据样本或 1 个 100 个 dim 数据样本。无论哪种方式，此类数据都没有协方差矩阵的概念，您至少需要 2 个样本和 2 个维度。只有一个维度，您唯一可以计算的就是方差。特别是这种行为在source code

# Covariance does not make sense for a single feature
X = check_array(X, ensure_min_features=2, estimator=self)

【讨论】：

我正在尝试根据行业代码进行聚类。这 100 个数据点对应于存在的不同类型的行业。我想使用亲和力传播，因为从一开始就不知道集群的数量。在这种情况下我该怎么办
ML 真的不是针对一个维度的数据。简单的脚本、绘制数据等都可以实现简单的事情。