【发布时间】:2018-07-29 23:39:26
【问题描述】:
我是 python 机器学习的新手,已经看到了堆叠模型的概念,并想试一试。问题是我不知道如何预测新数据,因为我不完全理解 python 中的机器学习实现。我管理废品的代码如下所示:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error,mean_squared_error
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import GradientBoostingRegressor
from catboost import CatBoostRegressor
from xgboost import XGBRegressor
from vecstack import stacking
import pandas as pd
X = pd.read_csv('db/file_name3.csv')
y = pd.read_csv('db/train_labels(1).csv')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
models = [
CatBoostRegressor(iterations=200,
learning_rate=0.03,
depth=4,
loss_function='RMSE',
eval_metric='RMSE',
random_seed=99,
od_type='Iter',
od_wait=50,
logging_level='Silent'),
CatBoostRegressor(iterations=500,
learning_rate=0.06,
depth=3,
loss_function='RMSE',
eval_metric='RMSE',
random_seed=99,
od_type='Iter',
od_wait=50,
logging_level='Silent'),
ExtraTreesRegressor(random_state = 0, n_jobs = -1,
n_estimators = 100, max_depth = 3),
RandomForestRegressor(random_state = 0, n_jobs = -1,
n_estimators = 300, max_depth = 3),
XGBRegressor(eta=0.02,reg_lambda=5,reg_alpha=1),
XGBRegressor(eta=0.1,reg_lambda=1,reg_alpha=10),
XGBRegressor(eta=0.02,reg_lambda=1,reg_alpha=10,n_estimators=300),
XGBRegressor(eta=0.012,max_depth=3,n_estimators=200),
GradientBoostingRegressor(),
BaggingRegressor(),
]
test1= pd.read_csv('db/Cleaned Data.csv')
S_train, S_test = stacking(models, X_train, y_train, X_train,
regression = True, metric = mean_absolute_error, n_folds = 10 ,
shuffle = True, random_state = 0, verbose = 2)
model = model.fit(S_train, y_train)
y_pred = model.predict(S_test)
print(y_pred.shape)
如您所见,test1 是我想要预测但无法弄清楚的数据。我可以从我的训练集中预测数据,但不能预测新的数据。我没有从文档中更改模型的任何参数。
【问题讨论】: