【问题标题】:Sklearn Error, array with 4 dim. Estimator <=2Sklearn 错误,具有 4 个暗淡的数组。估计器 <=2
【发布时间】:2016-09-18 13:41:39
【问题描述】:

我一直在尝试通过 panda 从 yahoo Finance 导入数据,然后通过 .as_matrix() 将其转换为数组,然后当我将数据输入分类器进行训练时,它给了我一个错误。

ValueError: Found array with dim 4. Estimator expected <= 2.

下面是我的代码:

from sklearn import tree
import pandas as pd
import pandas_datareader.data as web

df = web.DataReader('goog', 'yahoo', start='2012-5-1', end='2016-5-20')

close_price = df[['Close']]

ma_50 = (pd.rolling_mean(close_price, window=50))
ma_100 = (pd.rolling_mean(close_price, window=100))
ma_200 = (pd.rolling_mean(close_price, window=200))

#adding buys and sell based on the values
df['B/S']= (df['Close'].diff() < 0).astype(int)
close_buy = df[['Close']+['B/S']]
closing = df[['Close']].as_matrix()
buy_sell = df[['B/S']]


close_buy = pd.DataFrame.dropna(close_buy, 0, 'any')
ma_50 = pd.DataFrame.dropna(ma_50, 0, 'any')
ma_100 = pd.DataFrame.dropna(ma_100, 0, 'any')
ma_200 = pd.DataFrame.dropna(ma_200, 0, 'any')

close_buy = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_50 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_100 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_200 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix

print(ma_100)
clf = tree.DecisionTreeClassifier()
x = [[close_buy,ma_50,ma_100,ma_200]]
y = [buy_sell]

clf.fit(x,y)

【问题讨论】:

    标签: python-3.x pandas dataframe scikit-learn


    【解决方案1】:

    我发现了一些需要修复的错误/问题。

    1. 缺少括号buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix
    2. [[close_buy,ma_50,ma_100,ma_200]] 为您提供 4 个维度。相反,我会使用np.concatenate,它接受一个数组列表并将它们附加到彼此的长度或宽度上。参数axis=1 指定宽度。这样做的目的是使x 成为一个 822 x 28 矩阵,包含 28 个特征的 822 个观察值。如果这不是你想要的,那么显然我没有达到目标。但这些尺寸与您的 y 一致。

    改为:

    from sklearn import tree
    import pandas as pd
    import pandas_datareader.data as web
    
    df = web.DataReader('goog', 'yahoo', start='2012-5-1', end='2016-5-20')
    
    close_price = df[['Close']]
    
    ma_50 = (pd.rolling_mean(close_price, window=50))
    ma_100 = (pd.rolling_mean(close_price, window=100))
    ma_200 = (pd.rolling_mean(close_price, window=200))
    
    #adding buys and sell based on the values
    df['B/S']= (df['Close'].diff() < 0).astype(int)
    close_buy = df[['Close']+['B/S']]
    closing = df[['Close']].as_matrix()
    buy_sell = df[['B/S']]
    
    
    close_buy = pd.DataFrame.dropna(close_buy, 0, 'any')
    ma_50 = pd.DataFrame.dropna(ma_50, 0, 'any')
    ma_100 = pd.DataFrame.dropna(ma_100, 0, 'any')
    ma_200 = pd.DataFrame.dropna(ma_200, 0, 'any')
    
    close_buy = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
    ma_50 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
    ma_100 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
    ma_200 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
    buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix()  # Fixed
    
    print(ma_100)
    clf = tree.DecisionTreeClassifier()
    x = np.concatenate([close_buy,ma_50,ma_100,ma_200], axis=1)  # Fixed
    y = buy_sell  # Brackets not necessary... I don't think
    
    clf.fit(x,y)
    

    这是为我跑的:

    DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
                min_samples_split=2, min_weight_fraction_leaf=0.0,
                random_state=None, splitter='best')
    

    【讨论】:

    • np.concatenate 做了什么
    • 所以当这个运行时,会是价格,ma_50,ma_100,ma_200。这些数据是否会作为一个数据输入到 clf 中
    • x 的前 7 列与 close_buy 相同。接下来的 7 个与“ma_50”相同,依此类推。所以...是的。
    • 底层数组close_buyma_50等已经是np.arrays。这似乎很自然。答案是肯定的,有可能,但会很麻烦,你确定要这样做吗?
    • i 给了我另一个错误,即 ValueError: Unknown label type: array([[ 7.87401353e+02, 7.93261381e+02, 7.87071324e+02, ..., 5.48000000e+06, 3.96049623e+02, 0.00000000e+00], [ 7.95991368e+02, 8.07001373e+02, 7.95281379e+02, ..., 5.88550000e+06, 4.03022676e+02, 0.00000000, 0.0008.000, +02, 8.08971379e+02, 7.91791350e+02, ..., 5.54900000e+06, 3.95834832e+02, 1.00000000e+00],
    猜你喜欢
    • 2018-12-15
    • 2020-07-10
    • 2021-05-10
    • 1970-01-01
    • 2016-04-24
    • 2017-03-26
    • 2018-06-08
    • 2016-04-30
    • 2020-07-05
    相关资源
    最近更新 更多