【问题标题】:Can't predict values using Linear Regression无法使用线性回归预测值
【发布时间】:2020-06-16 06:11:08
【问题描述】:

那里!

我正在学习 Coursera 的 IBM 数据科学课程,并且正在尝试创建一些 sn-ps 来练习。我创建了以下code

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

# Import and format the dataframes
ibov = pd.read_csv('https://raw.githubusercontent.com/thiagobodruk/datasets/master/ibov.csv')
ifix = pd.read_csv('https://raw.githubusercontent.com/thiagobodruk/datasets/master/ifix.csv')
ibov['DATA'] = pd.to_datetime(ibov['DATA'], format='%d/%m/%Y')
ifix['DATA'] = pd.to_datetime(ifix['DATA'], format='%d/%m/%Y')
ifix = ifix.sort_values(by='DATA', ascending=False)
ibov = ibov.sort_values(by='DATA', ascending=False)
ibov = ibov[['DATA','FECHAMENTO']]
ibov.rename(columns={'FECHAMENTO':'IBOV'}, inplace=True)
ifix = ifix[['DATA','FECHAMENTO']]
ifix.rename(columns={'FECHAMENTO':'IFIX'}, inplace=True)

# Merge datasets 
df_idx = ibov.merge( ifix, how='left', on='DATA')
df_idx.set_index('DATA', inplace=True)
df_idx.head()

# Split training and testing samples
x_train, x_test, y_train, y_test = train_test_split(df_idx['IBOV'], df_idx['IFIX'], test_size=0.2)

# Convert the samples to Numpy arrays
regr = linear_model.LinearRegression()
x_train = np.array([x_train])
y_train = np.array([y_train])
x_test = np.array([x_test])
y_test = np.array([y_test])

# Plot the result
regr.fit(x_train, y_train)
y_pred = regr.predict(y_train)
plt.scatter(x_train, y_train)
plt.plot(x_test, y_pred, color='blue', linewidth=3) # This line produces no result

train_test_split() 方法返回的输出值出现了一些问题。所以我将它们转换为 Numpy 数组,然后我的代码就可以工作了。我可以正常绘制散点图,但无法绘制预测线。

在我的 IBM Data Cloud Notebook 上运行此代码会产生以下警告:

/opt/conda/envs/Python36/lib/python3.6/site-packages/matplotlib/axes/_base.py:380: MatplotlibDeprecationWarning: 不推荐在形状不匹配的输入列之间循环。 cbook.warn_deprecated("2.2", "在输入列之间循环"

我在 Google 和 StackOverflow 上进行了搜索,但我不知道出了什么问题。

我将不胜感激。提前致谢!

【问题讨论】:

    标签: python numpy matplotlib scikit-learn data-science


    【解决方案1】:

    您的代码中有几个问题,例如y_pred = regr.predict(y_train) 和您画线的方式。

    下面的代码 sn-p 应该会让你朝着正确的方向前进:

    # Split training and testing samples
    x_train, x_test, y_train, y_test = train_test_split(df_idx['IBOV'], df_idx['IFIX'], test_size=0.2)
    
    # Convert the samples to Numpy arrays
    regr = linear_model.LinearRegression()
    x_train = x_train.values
    y_train = y_train.values
    x_test = x_test.values
    y_test = y_test.values
    
    # Plot the result
    plt.scatter(x_train, y_train)
    
    regr.fit(x_train.reshape(-1,1), y_train)
    idx = np.argsort(x_train)
    y_pred = regr.predict(x_train[idx].reshape(-1,1))
    plt.plot(x_train[idx], y_pred, color='blue', linewidth=3);
    

    对已拟合模型的测试子集执行相同操作:

    # Plot the result
    plt.scatter(x_test, y_test)
    idx = np.argsort(x_test)
    y_pred = regr.predict(x_test[idx].reshape(-1,1))
    plt.plot(x_test[idx], y_pred, color='blue', linewidth=3);
    

    如果您有任何问题,请随时提出。

    【讨论】:

    • 非常感谢您的帮助!为什么使用x_train.reshape(-1,1)?我需要转换数组大小吗?
    • 因为您的回归器需要输入特征的二维数组。我怀疑在您的版本中您不需要重塑,因为您的 x_train 已经是 2d,
    • 刚刚找到这个解释stackoverflow.com/a/42510505/2684718。感谢您的回答!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2016-08-26
    • 1970-01-01
    • 2019-02-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多