【问题标题】：How to display my polynomial regression line?如何显示我的多项式回归线？
【发布时间】：2020-02-01 06:48:17
【问题描述】：

我的情节有一条很粗的线，这是我没想到的，也无法自己解决。我不知道如何显示图像。

在 Kaggle 的 Craigslist Auto 数据集上进行 EDA。我想显示，然后比较和对比每个独特车辆制造商和型号（即福特 F150）的线性和多项式回归拟合，关联价格和型号年份。

如何使用更正常的线条绘制以下图，线条宽度不会改变任何内容。

df_f150=df[df['Make and Model']=='ford F-150']

#plotting a linear regression line for each dataframe
fig = plt.figure(figsize=(10,7))
sns.regplot(x=df_f150.year, y=df_f150.price, color='b')


'#Here is where I try to do one of the polynomial regressions'

# Legend, title and labels.
#plt.legend(labels=x)
plt.title('Relationship Between Model Year and Price', size=24)
plt.xlabel('Year', size=18)
plt.ylabel('Price', size=18)
plt.xlim(1990,2020)
plt.ylim(1000,100000)

from sklearn.preprocessing import PolynomialFeatures 


X = df_f150['year'].values.reshape(-1,1)
y = df_f150['price'].values.reshape(-1,1)

poly = PolynomialFeatures(degree = 8) 
poly.fit_transform(X) 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

regressor = LinearRegression()  
regressor.fit(X_train, y_train) #training the algorithm

#To retrieve the intercept:
print(regressor.intercept_)
#For retrieving the slope:
print(regressor.coef_)

y_pred = regressor.predict(X_test)

dfres = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})
dfres

plt.scatter(X_test, y_test,  color='gray')
plt.plot(X_test, y_pred, color='red', linewidth=2)
plt.show()

【问题讨论】：

图表的图像也会有所帮助。你可以阅读this 并添加图片吗？
感谢您的链接。我添加了图像，标签是错误的，但这就是它的样子。
我发现您的问题不清楚。您的代码似乎与您显示的图片不匹配，因此很难理解您到底在哪里遇到了困难。是非常繁琐的数据框创建列表，regplot() 的长列表还是回归的输出？请仅提供代码的相关部分。见How to ask和Minimal, Complete, and Verifiable example
@DizietAsahi 感谢您的反馈，进行了一些额外的编辑，希望能够澄清。主要是想解决绘制回归输出的问题。
除非您实际生成Minimal, Complete, and Verifiable example，否则我无法重现该问题，这是一些可以复制和粘贴的代码，会产生有问题的结果。

标签： python pandas matplotlib scikit-learn seaborn

【解决方案1】：

首先，始终清理并检查数据：

给定来自Kaggle: Vehicles listings from Craigslist.org 的数据
顺便说一句，sns.regplot 生成的图与使用sklearn 执行回归生成的图几乎相同。因此，我没有包含附加代码。

加载和选择数据：

from pathlib import Path
import pandas as pd


file = Path.cwd() / 'data/craigslist-carstrucks-data/craigslistVehicles.csv'

df = pd.read_csv(file, usecols=['price', 'year', 'manufacturer', 'make'])

 price    year manufacturer          make
  3500  2006.0    chevrolet           NaN
  3399  2002.0        lexus         es300
  9000  2009.0    chevrolet  suburban lt2
 31999  2012.0          ram          2500
 16990  2003.0          ram          3500

# Select specific data:
# outliers exist, so price < 120000 and f-150 began production in 1975
ford = df[['price', 'year']][(df.manufacturer == 'ford') & (df.make == 'f-150') & (df.price < 120000) & (df.year >= 1975)]

 price    year
  1600  1992.0
 39760  2018.0
 11490  2014.0
  2500  1993.0
 17950  2014.0

用seaborn绘图：

sns.regplot

import seaborn as sns

sns.regplot(x=ford.year, y=ford.price)
plt.show()

这是情节，没有去除异常值：

地块平坦，因为最高价格是8.888889e+07
- 你设置了plt.ylim(1000,100000)，所以异常值没有出现
我武断地决定排除所有超过 12 万美元的价格，因为我知道这个样品的价格不切实际。
简单地去除异常值并不总是最好的选择。

print(ford.describe())

              price          year
count  1.127000e+04  11270.000000
mean   2.405777e+04   2010.459184
std    8.372461e+05      6.454361
min    0.000000e+00   1975.000000
25%    5.300000e+03   2007.000000
50%    1.548750e+04   2012.000000
75%    2.549500e+04   2015.000000
max    8.888889e+07   2020.000000

使用`sklearn` 执行回归的绘图

Linear Regression Example Plot

import matplotlib.pyplot as plt

plt.scatter(X_test, y_test)
plt.plot(X_test, y_pred, color='violet', linewidth=3)
plt.show()

在`sns.regplot()` 中绘制`X_test` 和`y_pred`：

sns.regplot(x=ford.year, y=ford.price)
sns.scatterplot(X_test.flatten(), y_pred.flatten(), color='r')
plt.show()

【讨论】：