【问题标题】:Python machine learning linear regression numpy list errorPython机器学习线性回归numpy列表错误
【发布时间】:2017-05-24 07:41:41
【问题描述】:

无法理解错误并找到错误的解决方案。我被困住了。我正在关注https://pythonprogramming.net/forecasting-predicting-machine-learning-tutorial 上的机器学习教程,而不是那么困难的线性回归。我尝试将列表更改为不可变,但我认为跟随的困难在于我正在收集的数据,似乎与本教程正在使用的数据有很大不同。我正在尝试使用我自己的数据。您可以将该站点的代码与此处的代码进行比较。我究竟做错了什么? 我该如何克服这个障碍?

import csv
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from matplotlib import style
import datetime
import math

style.use('ggplot')

df = {}

bid = []
btemp = []
ask = []
atemp = []
low = []
high = []
close = []

file=open("C:/documents/EURUSD.csv", "r")
reader = csv.reader(file)

for line in reader:
t=line[0],line[1],line[2],line[3],line[4],line[5],line[6],line[7],line[8]
    btemp = line[2] + line[3]
    bid.append(btemp)
    atemp = line[4] + line[5]
    ask.append(atemp)
    low.append(line[6])
    high.append(line[7])
    close.append(line[8])

bid.pop(0)
ask.pop(0)
low.pop(0)
high.pop(0)
close.pop(0)

nBid = [float(i) for i in bid]
nAsk = [float(i) for i in ask]
nHigh = [float(i) for i in high]
nLow = [float(i) for i in low]
nClose = [float(i) for i in close]

df['nClose'] = nClose

diffHighLow = [(x1 - x2) for (x1, x2) in zip(nHigh, nLow)]
sumBidAsk = [x1 + x2 for (x1, x2) in zip(nBid, nAsk)]
nSumBidAsk = []
for a in sumBidAsk:
    aTemp = (a / 2) * 100
    nSumBidAsk.append(aTemp)
df['HL_PCT'] = [x1 / x2 for (x1, x2) in zip(diffHighLow, nSumBidAsk)]

diffCloseBid = [(x1 - x2) for (x1, x2) in zip(nClose, nBid)]
divDiffCloseBid = [(x1 / x2) for (x1, x2) in zip(diffCloseBid, nBid)]
nPCT_change = []
for b in divDiffCloseBid:
    bTemp = b * 100
    nPCT_change.append(bTemp)
df['PCT_change'] = nPCT_change

df['forecast_col'] = df['nClose']
df['forecast_out'] = int(math.ceil(0.01 * len(df)))

df['laebl'] = df['forecast_col'].shift(-forecast_out)
X = np.array(df.drop(['label'], 1))

已编辑 |现在包括堆栈跟踪

File "<ipython-input-4-006cfd724c3e>", line 1, in <module>
runfile('C:/Users/venichhe/Desktop/test3.py', 
wdir='C:/Users/venichhe/Desktop')

File "C:\Users\venichhe\Anaconda2\lib\site-
packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)

File "C:\Users\venichhe\Anaconda2\lib\site-
packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)

File "C:/Users/venichhe/Desktop/test3.py", line 69, in <module>
df['laebl'] = df['forecast_col'].shift(-forecast_out)

AttributeError: 'list' object has no attribute 'shift'

【问题讨论】:

  • 错误是什么?你没有指定它。发布完整的堆栈跟踪。
  • 我编辑了帖子并包含了堆栈跟踪。我正在做更多的研究,我想我可能以错误的方式使用熊猫数据框。我还没有机会尝试,但我会在接下来的几天内尝试。

标签: python machine-learning linear-regression


【解决方案1】:

请改正拼写错误后再试:

df['**laebl**'] = df['forecast_col'].shift(-forecast_out)

df['label'] = df['forecast_col'].shift(-forecast_out)

【讨论】:

  • 当然,首先将 dict 转换为 pandas 数据框
【解决方案2】:

您根本没有使用 pandas 数据框,而是使用了一个名为 df 的字典,然后尝试像使用数据框一样使用它。尝试使用pandas.read_csv 加载您的数据。

【讨论】:

  • 我能够加载文件,谢谢。现在如何从 csv 中选择某些列。我试过像 ['column'] 和 [column] 一样使用它。都不行。