【问题标题】:sklearn MinMaxScaler - ValueError: Expected 2D array, got 1D array instead - data as series objectssklearn MinMaxScaler - ValueError: Expected 2D array, got 1D array instead - data as series objects
【发布时间】:2020-11-22 01:20:51
【问题描述】:

我想在分析之前使用sklearn 中的MinMaxScaler 来扩展测试和训练数据。

我一直在学习教程 (https://mc.ai/an-introduction-on-time-series-forecasting-with-simple-neura-networks-lstm/),但收到一条错误消息 ValueError: Expected 2D array, got 1D array instead

我尝试查看Print predict ValueError: Expected 2D array, got 1D array instead,但如果我尝试查看train = train.reshape(-1, 1)test = test.reshape(-1, 1),则会收到错误消息,因为它们是系列(错误消息AttributeError: 'Series' object has no attribute 'reshape'

如何最好地解决这个问题?

# Import libraries 
import pandas as pd 
from sklearn.preprocessing import MinMaxScaler 

# Create MWE dataset 
data = [['1981-11-03', 510], ['1982-11-03', 540], ['1983-11-03', 480],
   ['1984-11-03', 490], ['1985-11-03', 492], ['1986-11-03', 380],
   ['1987-11-03', 440], ['1988-11-03', 640], ['1989-11-03', 560], 
   ['1990-11-03', 660], ['1991-11-03', 610], ['1992-11-03', 480]] 

df = pd.DataFrame(data, columns = ['Date', 'Tickets']) 

# Set 'Date' to datetime data type 
df['Date'] = pd.to_datetime(df['Date'])

# Set 'Date to index   
df = df.set_index(['Date'], drop=True)

# Split dataset into train and test  
split_date = pd.Timestamp('1989-11-03')
df =  df['Tickets']
train = df.loc[:split_date]
test = df.loc[split_date:]

# Scale train and test data 
scaler = MinMaxScaler(feature_range=(-1, 1))
train_sc = scaler.fit_transform(train)
test_sc = scaler.transform(test)

X_train = train_sc[:-1]
y_train = train_sc[1:]
X_test = test_sc[:-1]
y_test = test_sc[1:]

# ERROR MESSAGE 
  ValueError: Expected 2D array, got 1D array instead:
  array=[510. 540. 480. 490. 492. 380. 440. 640. 560.].
  Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

【问题讨论】:

    标签: python machine-learning scikit-learn data-science


    【解决方案1】:

    线

    df =  df['Tickets']
    

    将您的数据转换为 pd.Series。

    如果你想获得一个数据框,你可以使用

    df =  df[['Tickets']]
    

    这应该可以解决您的问题;数据帧可以直接输入到 scaler fit 函数中,无需重新整形。

    【讨论】:

      猜你喜欢
      • 2021-05-05
      • 2021-07-28
      • 2021-11-01
      • 2020-12-05
      • 2018-12-23
      • 2020-10-10
      • 2020-07-25
      • 2018-08-25
      • 2018-03-20
      相关资源
      最近更新 更多