Tensorflow 和 Scikit 学习：相同的解决方案但不同的输出答案

【问题标题】：Tensorflow and Scikit learn: Same solution but different outputsTensorflow 和 Scikit 学习：相同的解决方案但不同的输出
【发布时间】：2018-11-22 05:03:11
【问题描述】：

我正在使用 scikitlearn 和 tensorflow 实现一个简单的线性回归。

我在 scikitlearn 中的解决方案看起来不错，但使用 tensorflow 我的评估输出显示了一些疯狂的数字。

问题基本上是试图根据多年的经验来预测薪水。

我不确定我在 Tensorflow 的代码中做错了什么。

谢谢！

ScikitLearn 解决方案

import pandas as pd
data = pd.read_csv('Salary_Data.csv') 

X = data.iloc[:, :-1].values
y = data.iloc[:, 1].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

X_single_data = [[4.6]]
y_single_pred = regressor.predict(X_single_data)

print(f'Train score: {regressor.score(X_train, y_train)}')
print(f'Test  score: {regressor.score(X_test, y_test)}')

火车分数：0.960775692121653

测试成绩：0.9248580247217076

Tensorflow 解决方案

import tensorflow as tf

f_cols = [tf.feature_column.numeric_column(key='X', shape=[1])]
estimator = tf.estimator.LinearRegressor(feature_columns=f_cols)


train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train,shuffle=False)

test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test,shuffle=False)


train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn)
eval_spec = tf.estimator.EvalSpec(input_fn=test_input_fn)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

({'average_loss': 7675087400.0,

'标签/平均值'：84588.11，

'损失'：69075790000.0，

'预测/平均'：5.0796494，

'global_step': 6},

[])

数据

YearsExperience,Salary
1.1,39343.00
1.3,46205.00
1.5,37731.00
2.0,43525.00
2.2,39891.00
2.9,56642.00
3.0,60150.00
3.2,54445.00
3.2,64445.00
3.7,57189.00
3.9,63218.00
4.0,55794.00
4.0,56957.00
4.1,57081.00
4.5,61111.00
4.9,67938.00
5.1,66029.00
5.3,83088.00
5.9,81363.00
6.0,93940.00
6.8,91738.00
7.1,98273.00
7.9,101302.00
8.2,113812.00
8.7,109431.00
9.0,105582.00
9.5,116969.00
9.6,112635.00
10.3,122391.00
10.5,121872.00

【问题讨论】：

标签： machine-learning scikit-learn linear-regression tensorflow-estimator

【解决方案1】：

根据您在 cmets 中的代码请求：虽然我在 http://zunzun.com/Equation/2/Sigmoidal/Sigmoid%20B/ 使用我的在线曲线和曲面拟合网站 zunzun.com 进行此方程的建模工作，但这里是一个使用 scipy 差分进化遗传的图形源代码示例算法模块来估计初始参数估计。差分进化的 scipy 实现使用拉丁超立方算法来确保彻底搜索参数空间，这需要搜索范围 - 在本例中，这些范围取自数据最大值和最小值，以及拟合统计量和参数值与网站上的几乎相同。

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings

xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])
yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])


def func(x, a, b, c):
    return a / (1.0 + numpy.exp(-(x-b)/c))


# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
    val = func(xData, *parameterTuple)
    return numpy.sum((yData - val) ** 2.0)


def generate_Initial_Parameters():
    # min and max used for bounds
    maxX = max(xData)
    minX = min(xData)
    maxY = max(yData)
    minY = min(yData)

    parameterBounds = []
    parameterBounds.append([minY, maxY]) # search bounds for a
    parameterBounds.append([minX, maxX]) # search bounds for b
    parameterBounds.append([minX, maxX]) # search bounds for c

    # "seed" the numpy random number generator for repeatable results
    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
    return result.x

# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()

# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('Years of experience') # X axis data label
    axes.set_ylabel('Salary in thousands') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

【讨论】：

【解决方案2】：

我无法在评论中放置图像，因此将其放置在此处。我怀疑这种关系可能是 sigmoidal 而不是线性的，并找到了以下 sigmoidal 方程和拟合统计数据，使用以千为单位的薪水：“y = a / (1.0 + exp(-(x-b)/c))” 拟合参数 a = 1.5535069418318591E+02，b = 5.4580059234664899E+00，c = 3.7724942500630938E+00 给出 R 平方 = 0.96 和 RMSE = 5.30（千）

【讨论】：

感谢您的帮助。你介意在这里发布你的代码吗？我把我的解决方案放在 github 上，请检查如何使用 scikit learn 找到线性解决方案解决方案 github.com/gabrielpsilva/ai-study-models/blob/master/… 我还在第一步，通过示例学习 :)
我无法在评论中格式化代码，因此将其发布为第二个答案。