【问题标题】:Having problems with dimensions in machine learning ( Python Scikit )机器学习中的维度问题(Python Scikit)
【发布时间】:2015-02-23 06:58:51
【问题描述】:

我对应用机器学习有点陌生,所以我试图自学如何使用 mldata.org 和 Python scikit 包中的任何类型的数据进行线性回归。我测试了线性回归示例代码 (http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html),该代码在糖尿病数据集上运行良好。但是,我尝试将代码与其他数据集一起使用,例如关于 mldata 上的地震 (http://mldata.org/repository/data/viewslug/global-earthquakes/)。但是,由于那里的尺寸问题,我无法这样做。

Warning (from warnings module):
  File "/usr/lib/python2.7/dist-packages/numpy/core/_methods.py", line 55
    warnings.warn("Mean of empty slice.", RuntimeWarning)
RuntimeWarning: Mean of empty slice.

Warning (from warnings module):
  File "/usr/lib/python2.7/dist-packages/numpy/core/_methods.py", line 65
    ret, rcount, out=ret, casting='unsafe', subok=False)
RuntimeWarning: invalid value encountered in true_divide

Traceback (most recent call last):
  File "/home/anthony/Documents/Programming/Python/Machine Learning/Scikit/earthquake_linear_regression.py", line 38, in <module>
    regr.fit(earthquake_X_train, earthquake_y_train)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/base.py", line 371, in fit
    linalg.lstsq(X, y)
  File "/usr/lib/python2.7/dist-packages/scipy/linalg/basic.py", line 518, in lstsq
    raise ValueError('incompatible dimensions')
ValueError: incompatible dimensions

如何设置数据的维度?

数据大小:

地震_X.shape (59209, 1, 4) 地震_X_train.shape (59189, 1) 地震y_test.shape (3, 59209) 地震.目标.形状 (3, 59209)

代码:

# Code source: Jaques Grobler
# License: BSD 3 clause


import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
#Experimenting with earthquake data

from sklearn.datasets.mldata import fetch_mldata
import tempfile
test_data_home = tempfile.mkdtemp()


# Load the diabetes dataset
earthquake = fetch_mldata('Global Earthquakes', data_home = test_data_home)


# Use only one feature
earthquake_X = earthquake.data[:, np.newaxis]
earthquake_X_temp = earthquake_X[:, :, 2]

# Split the data into training/testing sets
earthquake_X_train = earthquake_X_temp[:-20]
earthquake_X_test = earthquake_X_temp[-20:]

# Split the targets into training/testing sets
earthquake_y_train = earthquake.target[:-20]
earthquake_y_test = earthquake.target[-20:]
print "Splitting of data for preformance check completed"
# Create linear regression object
regr = linear_model.LinearRegression()
print "Created linear regression object"
# Train the model using the training sets
regr.fit(earthquake_X_train, earthquake_y_train)
print "Dataset trained"
# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean square error
print("Residual sum of squares: %.2f"
      % np.mean((regr.predict(earthquake_X_test) - earthquake_y_test) ** 2))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % regr.score(earthquake_X_test, earthquake_y_test))

# Plot outputs
plt.scatter(earthquake_X_test, earthquake_y_test,  color='black')
plt.plot(earthquake_X_test, regr.predict(earthquake_X_test), color='blue',
         linewidth=3)

plt.xticks(())
plt.yticks(())

plt.show()

【问题讨论】:

    标签: python machine-learning scikit-learn linear-regression dimensions


    【解决方案1】:

    您的目标数组 (earthquake_y_train) 形状错误。而且实际上它是空的。

    当你这样做时

    earthquake_y_train = earthquake.target[:-20]
    

    第一个轴中选择除最后 20 行之外的所有行。而且,根据您发布的数据,earthquake.target 的形状为(3, 59209),因此没有可供选择的行!

    但即使有,它仍然是一个错误。为什么?因为Xy 的第一个维度必须相同。根据 sklearn 的文档,LinearRegression's fit 期望 X 的形状为 [n_samples, n_features] 和 y — [n_samples, n_targets]。

    为了修复它,将 ys 的定义更改为以下内容:

    earthquake_y_train = earthquake.target[:, :-20].T
    earthquake_y_test = earthquake.target[:, -20:].T
    

    附:即使您解决了所有这些问题,您的脚本中仍然存在问题:plt.scatter 无法使用“多维”ys。

    【讨论】:

    • 如何解决多维问题?此外,当我尝试使用您拥有的代码时,我最终得到了一个除以零的错误:“ZeroDivisionError:除以零”
    猜你喜欢
    • 2019-04-19
    • 2016-06-18
    • 2016-09-26
    • 2017-10-22
    • 2015-12-22
    • 1970-01-01
    • 2020-06-28
    • 2017-10-18
    • 2019-08-01
    相关资源
    最近更新 更多