【问题标题】:Simple prediction using linear regression with python使用 python 进行线性回归的简单预测
【发布时间】:2015-06-19 19:36:20
【问题描述】:
data2 = pd.DataFrame(data1['kwh'])
data2
                          kwh
date    
2012-04-12 14:56:50     1.256400
2012-04-12 15:11:55     1.430750
2012-04-12 15:27:01     1.369910
2012-04-12 15:42:06     1.359350
2012-04-12 15:57:10     1.305680
2012-04-12 16:12:10     1.287750
2012-04-12 16:27:14     1.245970
2012-04-12 16:42:19     1.282280
2012-04-12 16:57:24     1.365710
2012-04-12 17:12:28     1.320130
2012-04-12 17:27:33     1.354890
2012-04-12 17:42:37     1.343680
2012-04-12 17:57:41     1.314220
2012-04-12 18:12:44     1.311970
2012-04-12 18:27:46     1.338980
2012-04-12 18:42:51     1.357370
2012-04-12 18:57:54     1.328700
2012-04-12 19:12:58     1.308200
2012-04-12 19:28:01     1.341770
2012-04-12 19:43:04     1.278350
2012-04-12 19:58:07     1.253170
2012-04-12 20:13:10     1.420670
2012-04-12 20:28:15     1.292740
2012-04-12 20:43:15     1.322840
2012-04-12 20:58:18     1.247410
2012-04-12 21:13:20     0.568352
2012-04-12 21:28:22     0.317865
2012-04-12 21:43:24     0.233603
2012-04-12 21:58:27     0.229524
2012-04-12 22:13:29     0.236929
2012-04-12 22:28:34     0.233806
2012-04-12 22:43:38     0.235618
2012-04-12 22:58:43     0.229858
2012-04-12 23:13:43     0.235132
2012-04-12 23:28:46     0.231863
2012-04-12 23:43:55     0.237794
2012-04-12 23:59:00     0.229634
2012-04-13 00:14:02     0.234484
2012-04-13 00:29:05     0.234189
2012-04-13 00:44:09     0.237213
2012-04-13 00:59:09     0.230483
2012-04-13 01:14:10     0.234982
2012-04-13 01:29:11     0.237121
2012-04-13 01:44:16     0.230910
2012-04-13 01:59:22     0.238406
2012-04-13 02:14:21     0.250530
2012-04-13 02:29:24     0.283575
2012-04-13 02:44:24     0.302299
2012-04-13 02:59:25     0.322093
2012-04-13 03:14:30     0.327600
2012-04-13 03:29:31     0.324368
2012-04-13 03:44:31     0.301869
2012-04-13 03:59:42     0.322019
2012-04-13 04:14:43     0.325328
2012-04-13 04:29:43     0.306727
2012-04-13 04:44:46     0.299012
2012-04-13 04:59:47     0.303288
2012-04-13 05:14:48     0.326205
2012-04-13 05:29:49     0.344230
2012-04-13 05:44:50     0.353484
...

65701 rows × 1 columns

我有这个索引和 1 列的数据框。我想使用线性回归和 sklearn 进行简单的预测。我很困惑,我不知道如何设置 X 和 y(我希望 x 值是时间和 y 值 kwh...)。我是 Python 新手,所以每一个帮助都很有价值。谢谢。

【问题讨论】:

    标签: python scikit-learn linear-regression


    【解决方案1】:

    您要做的第一件事是将数据拆分为两个数组,X 和 y。 X 的每个元素都是一个日期,y 的对应元素是相关的 kwh。

    一旦你有了它,你会想要使用 sklearn.linear_model.LinearRegression 来做回归。文档是here

    对于每个 sklearn 模型,有两个步骤。首先,您必须适合您的数据。然后,将要预测kwh的日期放入另一个数组X_predict中,并使用predict方法预测kwh。

    from sklearn.linear_model import LinearRegression
    
    X = []  # put your dates in here
    y = []  # put your kwh in here
    
    model = LinearRegression()
    model.fit(X, y)
    
    X_predict = []  # put the dates of which you want to predict kwh here
    y_predict = model.predict(X_predict)
    

    【讨论】:

    • 预测给出了什么?结果数组中的数字是多少?
    【解决方案2】:

    Predict() 函数将二维数组作为参数。所以,如果你想预测简单线性回归的值,那么你必须在二维数组内发出预测值,例如,

    model.predict([[2012-04-13 05:55:30]]);

    如果是多元线性回归,那么,

    model.predict([[2012-04-13 05:44:50,0.327433]])

    【讨论】:

      【解决方案3】:

      线性回归:

      import pandas as pd  
      import numpy as np  
      import matplotlib.pyplot as plt  
      data=pd.read_csv('Salary_Data.csv')  
      X=data.iloc[:,:-1].values  
      y=data.iloc[:,1].values  
      
      #split dataset in train and testing set   
      from sklearn.cross_validation import train_test_split  
      X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=10,random_state=0)  
      
      from sklearn.linear_model import LinearRegression  
      regressor=LinearRegression()  
      regressor.fit(X_train,Y_train)  
      y_pre=regressor.predict(X_test)  
      

      【讨论】:

      • 能否进一步解释如何选择数据,因为这也是问题的一部分?
      【解决方案4】:

      您可以查看我在 Github 上的代码,其中我使用带有简单线性回归模型的昆虫蟋蟀的啁啾声来预测温度。我已经用 cmets 解释了代码

      #Import the libraries required
      import numpy as np
      import matplotlib.pyplot as plt
      import pandas as pd
      
      #Importing the excel data 
      dataset = pd.read_excel('D:\MachineLearing\Machine Learning A-Z Template Folder\Part 2 - Regression\Section 4 - Simple Linear Regression\CricketChirpsVs.Temperature.xls')
      
      x = dataset.iloc[:, :-1].values
      y = dataset.iloc[:, 1].values
      
      #Split the data into train and test dataset
      from sklearn.cross_validation import train_test_split
      x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=1/3,random_state=42)
      
      #Fitting Simple Linear regression data model to train data set
      from sklearn.linear_model import LinearRegression
      regressorObject=LinearRegression()
      regressorObject.fit(x_train,y_train)
      
      #predict the test set
      y_pred_test_data=regressorObject.predict(x_test)
      
      
      # Visualising the Training set results in a scatter plot
      plt.scatter(x_train, y_train, color = 'red')
      plt.plot(x_train, regressorObject.predict(x_train), color = 'blue')
      plt.title('Cricket Chirps vs Temperature (Training set)')
      plt.xlabel('Cricket Chirps (chirps/sec for the striped ground cricket) ')
      plt.ylabel('Temperature (in degrees Fahrenheit)')
      plt.show()
      
      # Visualising the test set results in a scatter plot
      plt.scatter(x_test, y_test, color = 'red')
      plt.plot(x_train, regressorObject.predict(x_train), color = 'blue')
      plt.title('Cricket Chirps vs Temperature (Test set)')
      plt.xlabel('Cricket Chirps (chirps/sec for the striped ground cricket) ')
      plt.ylabel('Temperature (in degrees Fahrenheit)')
      plt.show()
      

      欲了解更多信息,请访问

      https://github.com/wins999/Cricket_Chirps_Vs_Temprature--Simple-Linear-Regression-in-Python-

      【讨论】:

        【解决方案5】:

        将数据集拆分为训练集和测试集后

        from sklearn.model_selection import train_test_split
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state =0)
        

        在训练集上训练您的简单线性回归模型

        from sklearn.linear_model import LinearRegression
        regressor = LinearRegression()
        regressor.fit(X_train, y_train)
        

        预测测试集结果

        y_predict = regressor.predict(X_test)
        

        【讨论】:

          【解决方案6】:

          您应该实现以下代码。

          import pandas as pd
          from sklearn.linear_model import LinearRegression # to build linear regression model
          from sklearn.cross_validation import train_test_split # to split dataset
          
          data2 = pd.DataFrame(data1['kwh'])
          data2 = data2.reset_index() # will create new index (0 to 65700) so date column wont be an index now.
          X = data2.iloc[:,0]   # date column
          y = data2.iloc[:,-1]  # kwh column
          
          Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.80, random_state=20)  
          
          linearModel = LinearRegression()
          linearModel.fit(Xtrain, ytrain)
          ypred = model.predict(Xtest)
          

          这里 ypred 会给你概率。

          【讨论】:

            【解决方案7】:

            以防万一有人正在寻找没有 sklearn 的解决方案

            import numpy as np
            import pandas as pd
            
            def variance(values, mean):
                return sum([(val-mean)**2 for val in values])
            
            def covariance(x, mean_x, y , mean_y):
                covariance = 0.0
                for r in range(len(x)):
                    covariance = covariance + (x[r] - mean_x) * (y[r] - mean_y)
                return covariance
            
            def get_coef(df):
                mean_x = sum(df['x']) / float(len(df['x']))
                mean_y = sum(df['y']) / float(len(df['y']))
                variance_x = variance(df['x'], mean_x)
                #variance_y = variance(df['y'], mean_y)
                covariance_x_y = covariance(df['x'],mean_x,df['y'],mean_y)
                m = covariance_x_y / variance_x
                c = mean_y - m * mean_x
                return m,c
            
            def get_y(x,m,c):
                return m*x+c
            

            灵感来自https://github.com/dhirajk100/Linear-Regression-from-Scratch-in-Python/blob/master/Linear%20Regression%20%20from%20Scratch%20Without%20Sklearn.ipynb

            【讨论】:

              猜你喜欢
              • 2016-05-24
              • 2015-06-19
              • 2018-04-02
              • 2017-11-30
              • 2021-02-27
              • 2018-10-09
              • 2018-07-23
              • 2016-04-14
              相关资源
              最近更新 更多