由于您的列之一导致您收到以下错误,因此在应用回归之前,时间戳很可能是数字:
ValueError:无法使用 dtype='numeric' 将字节/字符串数组转换为十进制数
我认为最好在数据框中索引时间戳以避免这种情况。尝试以下操作:
import pandas as pd
import numpy as np
import datetime as dt
#Generate data for 20 days
x1 = np.arange(1, 21) + 0.3 * (np.random.random(size=(20,)) - 0.5)
x2 = np.arange(1, 21) + 0.2 * (np.random.random(size=(20,)) - 0.5)
start = dt.datetime.strptime("1 Nov 01", "%d %b %y")
daterange = pd.date_range(start, periods=20)
table = {"Sale": x1, "Date": daterange}
df = pd.DataFrame(table)
df.set_index("Date", inplace=True)
#df
#Data split
time_sample='2001-11-16'
time_stamp_index = df.index.get_loc(pd.Timestamp(time_sample),method='pad')
X_train = df.iloc[:time_stamp_index,:].values
y_train = df.iloc[:time_stamp_index,:].values
X_test = df.iloc[time_stamp_index:,:].values
y_test = df.iloc[time_stamp_index:,:].values
#Apply regression
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_predict = model.predict(X_test)
#plot the results
train_set = df.iloc[:time_stamp_index,:]
test_set = df.iloc[time_stamp_index:,:]
test_set_copy = test_set.copy()
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")
plt.figure(figsize=(12, 10))
plt.xlabel("Date")
plt.xticks(rotation=45)
plt.ylabel("Values")
plt.title(f" Plot")
plt.plot(train_set,label='trainingSet')
plt.plot(test_set,"k",label='testSet')
plt.plot(test_set_copy,'r--',label='RF_predict')
plt.legend()
输出: