Python机器学习线性回归numpy列表错误答案

【问题标题】：Python machine learning linear regression numpy list errorPython机器学习线性回归numpy列表错误
【发布时间】：2017-05-24 07:41:41
【问题描述】：

无法理解错误并找到错误的解决方案。我被困住了。我正在关注https://pythonprogramming.net/forecasting-predicting-machine-learning-tutorial 上的机器学习教程，而不是那么困难的线性回归。我尝试将列表更改为不可变，但我认为跟随的困难在于我正在收集的数据，似乎与本教程正在使用的数据有很大不同。我正在尝试使用我自己的数据。您可以将该站点的代码与此处的代码进行比较。我究竟做错了什么？我该如何克服这个障碍？

import csv
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from matplotlib import style
import datetime
import math

style.use('ggplot')

df = {}

bid = []
btemp = []
ask = []
atemp = []
low = []
high = []
close = []

file=open("C:/documents/EURUSD.csv", "r")
reader = csv.reader(file)

for line in reader:
t=line[0],line[1],line[2],line[3],line[4],line[5],line[6],line[7],line[8]
    btemp = line[2] + line[3]
    bid.append(btemp)
    atemp = line[4] + line[5]
    ask.append(atemp)
    low.append(line[6])
    high.append(line[7])
    close.append(line[8])

bid.pop(0)
ask.pop(0)
low.pop(0)
high.pop(0)
close.pop(0)

nBid = [float(i) for i in bid]
nAsk = [float(i) for i in ask]
nHigh = [float(i) for i in high]
nLow = [float(i) for i in low]
nClose = [float(i) for i in close]

df['nClose'] = nClose

diffHighLow = [(x1 - x2) for (x1, x2) in zip(nHigh, nLow)]
sumBidAsk = [x1 + x2 for (x1, x2) in zip(nBid, nAsk)]
nSumBidAsk = []
for a in sumBidAsk:
    aTemp = (a / 2) * 100
    nSumBidAsk.append(aTemp)
df['HL_PCT'] = [x1 / x2 for (x1, x2) in zip(diffHighLow, nSumBidAsk)]

diffCloseBid = [(x1 - x2) for (x1, x2) in zip(nClose, nBid)]
divDiffCloseBid = [(x1 / x2) for (x1, x2) in zip(diffCloseBid, nBid)]
nPCT_change = []
for b in divDiffCloseBid:
    bTemp = b * 100
    nPCT_change.append(bTemp)
df['PCT_change'] = nPCT_change

df['forecast_col'] = df['nClose']
df['forecast_out'] = int(math.ceil(0.01 * len(df)))

df['laebl'] = df['forecast_col'].shift(-forecast_out)
X = np.array(df.drop(['label'], 1))

已编辑 |现在包括堆栈跟踪

File "<ipython-input-4-006cfd724c3e>", line 1, in <module>
runfile('C:/Users/venichhe/Desktop/test3.py', 
wdir='C:/Users/venichhe/Desktop')

File "C:\Users\venichhe\Anaconda2\lib\site-
packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)

File "C:\Users\venichhe\Anaconda2\lib\site-
packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)

File "C:/Users/venichhe/Desktop/test3.py", line 69, in <module>
df['laebl'] = df['forecast_col'].shift(-forecast_out)

AttributeError: 'list' object has no attribute 'shift'

【问题讨论】：

错误是什么？你没有指定它。发布完整的堆栈跟踪。
我编辑了帖子并包含了堆栈跟踪。我正在做更多的研究，我想我可能以错误的方式使用熊猫数据框。我还没有机会尝试，但我会在接下来的几天内尝试。

标签： python machine-learning linear-regression

【解决方案1】：

请改正拼写错误后再试：

df['**laebl**'] = df['forecast_col'].shift(-forecast_out)

到

df['label'] = df['forecast_col'].shift(-forecast_out)

【讨论】：

当然，首先将 dict 转换为 pandas 数据框

【解决方案2】：

您根本没有使用 pandas 数据框，而是使用了一个名为 df 的字典，然后尝试像使用数据框一样使用它。尝试使用pandas.read_csv 加载您的数据。

【讨论】：

我能够加载文件，谢谢。现在如何从 csv 中选择某些列。我试过像 ['column'] 和 [column] 一样使用它。都不行。