【发布时间】:2019-06-27 23:39:40
【问题描述】:
我正在我的租赁数据集上使用 scikit-learn 的 SGDRegressor 算法实现梯度体面,以根据面积预测租金,但得到奇怪的系数和截距,因此,租金预测很奇怪。
租赁数据集:rentals.csv(已完成的列
area,bedrooms,furnished,cost
650,2,1,33000
750,3,0,55000
247,1,0,10500
1256,4,0,65000
900,3,0,37000
900,3,0,50000
550,2,0,30000
1700,4,0,72000
1300,4,0,45000
1600,4,2,57000
475,2,1,30000
800,3,0,45000
350,2,0,15000
247,1,0,11500
247,1,0,16500
247,1,0,15000
330,2,0,16000
450,2,2,25000
325,1,0,13500
1650,4,0,90000
650,2,0,31000
1650,4,0,65000
900,3,0,40000
635,2,0,30000
475,2,2,28000
1120,3,0,45000
1000,3,0,38000
900,3,2,50000
610,3,0,28000
400,2,0,17000
具有 alpha 值 = .000001 和 max_iter=1000 的 Python 代码
import pandas
full_data = pandas.read_csv ("./rentals.csv")
rentals = pandas.DataFrame ({'area':full_data.area,'cost':full_data.cost})
from sklearn.model_selection import train_test_split
train, test = train_test_split (rentals, test_size=0.2, random_state=11)
trainX = pandas.DataFrame ({'area': train['area']})
trainY = pandas.DataFrame ({'cost': train['cost']})
testX = pandas.DataFrame ({'area': test['area']})
testY = pandas.DataFrame ({'cost': test['cost']})
from sklearn.linear_model import SGDRegressor
reg = SGDRegressor(max_iter=1000, alpha=.000001, tol=.0001)
reg.fit (trainX, trainY)
from sklearn.metrics import mean_squared_error, r2_score
print ('Coefficients: \n', reg.coef_)
print ('Intercept: \n', reg.intercept_)
yhat = reg.predict (testX)
print ('Mean squared error: \n', mean_squared_error (testY, yhat))
print ('Variance score: \n', r2_score (testY, yhat))
print('yhat :: ',yhat)
输出
Coefficients:
[-1.77569698e+12]
Intercept:
[2.20231032e+10]
Mean squared error:
2.7699546187784015e+30
Variance score:
-1.1843036374824519e+22
yhat :: [-4.38575131e+14 -2.30838405e+15 -9.76611316e+14 -1.77567496e+15
-2.23025338e+15 -1.42053556e+15]
当 Alpha = .00000001 时
reg = SGDRegressor(max_iter=1000, alpha=.00000001, tol=.0001)
输出
Coefficients:
[-1.35590231e+12]
Intercept:
[-9.70811558e+10]
Mean squared error:
1.6153367348228915e+30
Variance score:
-6.906427844848468e+21
yhat :: [-3.35004951e+14 -1.76277008e+15 -7.45843351e+14 -1.35599939e+15
-1.70311038e+15 -1.08481893e+15]
我已经尝试了所有值,直到 alpha = .00000000001
reg = SGDRegressor(max_iter=1000, alpha=.00000000001, tol=.0001)
输出
Coefficients:
[1.81827102e+12]
Intercept:
[8.5060188e+09]
Mean squared error:
2.9044685546452095e+30
Variance score:
-1.2418155340525837e+22
yhat :: [4.49121448e+14 2.36376083e+15 1.00005757e+15 1.81827952e+15
2.28375691e+15 1.45462532e+15]
请建议我的代码中有什么不正确的地方?为什么我得到不正确的值?
提前致谢。
【问题讨论】:
-
感谢您发布简短、完整且有效的代码示例以及数据。理想情况下,您应该使数据成为代码的一部分,以简化试图提供帮助的人的复制和粘贴过程。此外,不同 alpha 的输出有点多。发布一个具有代表性的输出并声明较小的 alpha 不会产生影响就足够了(为什么会这样呢?)。
标签: python scikit-learn gradient-descent