【发布时间】:2021-08-29 16:08:34
【问题描述】:
我想创建一个管道,继续编码、缩放,然后使用 xgboost 分类器解决多标签问题。 代码块;
# Create a boolean mask for categorical columns
categorical_columns = X.columns[X.dtypes == 'O'].tolist()
#Distinct columns for to find catagories
unique_list = [X[c].unique().tolist() for c in categorical_columns]
# Create a boolean mask for numerical columns
numerical_columns = X.columns[X.dtypes != 'O'].tolist()
#Encoding & Scaling objects
scaler = StandardScaler()
ohe = OneHotEncoder(categories=unique_list, sparse=False)
#Define a pipeline
pipeline = Pipeline([("ohe_onestep", ohe.fit_transform(X[categorical_columns])),
("scaler_onestep", scaler.fit_transform(X[numerical_columns])),
MultiOutputClassifier(xgb.XGBClassifier(objective='binary:logistic'))])
# Cross-validate the model
cross_val_scores = cross_val_score(pipeline, X, y,
scoring='accuracy', cv=5)
但是当我运行代码时会出现这个错误; 行是;
> pipeline = Pipeline([("ohe_onestep", ohe.fit_transform(X[categorical_columns])),
'MultiOutputClassifier' 对象不可迭代
我该如何解决这个问题?
【问题讨论】:
标签: python pandas scikit-learn multilabel-classification