【发布时间】:2022-01-01 14:41:51
【问题描述】:
我想对我的简单模型使用 One Hot Encoding。然而,无论我如何设置它似乎都会触发错误。首先,即使我有 1.0.2 版的 sklearn,One Hot Encoding 也不会将字符串转换为浮点数。现在的问题是因为我的训练数据中的值与测试数据中的长度不同。训练只有 2 个值,测试有全部三个。我该如何解决?确切的错误是一系列的真值不明确。这种其他想法的错误是重塑数据。
import lightgbm as lgbm
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
X = [[ 'apple',5],['banana',1],['apple',6],['banana',2]]
X=pd.DataFrame(X).to_numpy()
test = [[ 'pineapple',0],['banana',1],['apple',7],['banana,2']]
y = [1,0,1,0]
y=pd.DataFrame(y).to_numpy()
labels = ['apples','bananas','pineapple']
ohc = OneHotEncoder(categories=labels)
pp = ColumnTransformer(
transformers=[('ohc', ohc, [0])]
,remainder = 'passthrough')
model=lgbm.LGBMClassifier()
mymodel = Pipeline(steps = [('preprocessor', pp),
('model', model)
])
params = {'model__learning_rate':[0.1]
,'model__n_estimators':[2]}
lgbm_gs=GridSearchCV(
estimator = mymodel, param_grid=params, n_jobs = -1,
cv=2, scoring='accuracy'
,verbose=-1)
lgbm_gs.fit(X,y)
【问题讨论】:
标签: scikit-learn one-hot-encoding