【发布时间】:2018-11-08 20:15:18
【问题描述】:
请在此处提供帮助:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
X = [[1.1],[1.3],[1.5],[2],[2.2],[2.9],[3],[3.2],[3.2],[3.7],[3.9],[4],[4],[4.1],[4.5],[4.9],[5.1],[5.3],[5.9],[6],[6.8],[7.1],[7.9],[8.2],[8.7],[9],[9.5],[9.6],[10.3],[10.5]]
y = [39343,46205,37731,43525,39891,56642,60150,54445,64445,57189,63218,55794,56957,57081,61111,67938,66029,83088,81363,93940,91738,98273,101302,113812,109431,105582,116969,112635,122391,121872]
#implement the dataset for train & test
from sklearn.cross_validation import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 1/3,random_state=0)
#implement our classifier based on Simple Linear Regression
from sklearn.linear_model import LinearRegression
SimpleLinearRegression = LinearRegression()
SimpleLinearRegression.fit(X_train,y_train)
y_predict= SimpleLinearRegression.predict(X_test)
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test,y_predict))
我确定我在这里遗漏了一些东西,还有其他方法可以计算回归的准确度分数吗?在此先感谢:)
【问题讨论】:
-
y = dataset.iloc[:,1].values 不应该是 y = dataset.iloc[:,-1].values 吗?
-
@mamun,我认为是对的。打印正确的值:print(X_test) print(y_test)
-
我同意@mamun 的观点:
X将包含除最后一列之外的所有列,而目标变量y是第二列,即它包含在X中。根据您的目标变量的实际位置,您需要更改X或y切片规则 -
在替换 X 和 y 时,我遇到了同样的问题......X = [[1.1],[1.3],[1.5],[2],[2.2 ],[2.9],[3],[3.2],[3.2],[3.7],[3.9],[4],[4],[4.1],[4.5],[4.9],[5.1], [5.3],[5.9],[6],[6.8],[7.1],[7.9],[8.2],[8.7],[9],[9.5],[9.6],[10.3],[10.5] ]] #实现训练和测试数据集 y = [39343,46205,37731,43525,39891,56642,60150,54445,64445,57189,63218,55794,56957,57081,61111,67938,66026,83088,813 ,93940,91738,98273,101302,113812,109431,105582,116969,112635,122391,121872]
标签: machine-learning artificial-intelligence classification regression linear-regression