【问题标题】:How is Linear Regression model from sklearn predicting non-linearly in the following code?sklearn 的线性回归模型如何在以下代码中进行非线性预测?
【发布时间】:2020-01-10 05:43:45
【问题描述】:

由于线性回归算法会找到训练数据的最佳拟合线,因此对新数据的预测将始终位于该最佳拟合线上。那么来自 sklearn 的线性回归模型如何非线性地预测数据,如图所示。 !(https://pythonprogramming.net/static/images/machine-learning/linear-regression-prediction.png)

import Quandl, math
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from matplotlib import style
import datetime

style.use('ggplot')

df = Quandl.get("WIKI/GOOGL")
df = df[['Adj. Open',  'Adj. High',  'Adj. Low',  'Adj. Close', 'Adj. Volume']]
df['HL_PCT'] = (df['Adj. High'] - df['Adj. Low']) / df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] * 100.0

df = df[['Adj. Close', 'HL_PCT', 'PCT_change', 'Adj. Volume']]
forecast_col = 'Adj. Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)

X = np.array(df.drop(['label'], 1))
X = preprocessing.scale(X)
X_lately = X[-forecast_out:]
X = X[:-forecast_out]

df.dropna(inplace=True)

y = np.array(df['label'])

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)
clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)

forecast_set = clf.predict(X_lately)
df['Forecast'] = np.nan

last_date = df.iloc[-1].name
last_unix = last_date.timestamp()
one_day = 86400
next_unix = last_unix + one_day

for i in forecast_set:
    next_date = datetime.datetime.fromtimestamp(next_unix)
    next_unix += 86400
    df.loc[next_date] = [np.nan for _ in range(len(df.columns)-1)]+[i]

df['Adj. Close'].plot()
df['Forecast'].plot()
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()

【问题讨论】:

  • 相对于'HL_PCT', 'PCT_change', 'Adj. Volume'是线性的,而不是时间

标签: python pandas scikit-learn linear-regression


【解决方案1】:

线性回归产生的模型在所有预测特征中都是线性的,即X。您的模型似乎使用 'HL_PCT', 'PCT_change', 'Adj. Volume' 的特征进行了训练。但是,该图仅包含 X 轴上的一个特征(与所有 2D 图一样)Date,这甚至不是您的预测特征之一。即使 Date 是您在 X 中的预测特征之一,从多个维度向下投影到 1 也会使模型看起来非线性。

【讨论】:

    猜你喜欢
    • 2015-06-19
    • 2017-11-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-10-09
    • 2019-08-08
    • 2016-05-15
    • 2021-03-12
    相关资源
    最近更新 更多