【发布时间】:2020-04-14 01:34:54
【问题描述】:
我构建了如下预处理管道
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, num_attr),
('cat', categorical_transformer, cat_attr)])
我正在尝试在我的 X_train 上安装管道。 X_train如下
icker SF1 SF2 SF3 SF4 SF5 SF6
$NTAP -0.628651934 0.98889147 -0.055714478 0.774378771 0.551088847 -1.329228593
$WYNN 1.315785931 1.438754002 0.187327182 0.608933159 -1.153029724 1.85944112
$DRI -1.141388142 -1.455015677 0.332754543 0.674501682 0.111326137 -0.478596905
$ge -0.054839437 -1.454148681 -0.162266534 -0.681870355 0.307868519 -0.529986948
我使用下面的块进行预处理
clf_nm = Pipeline(steps=[('preprocessor', preprocessor)])
X_train_nm = pd.DataFrame(clf_nm.fit_transform(X_train))
但是“X_train_nm”,上面的输出有如下垃圾值
0
0 (0, 0)\t0.42994752134634545\n (0, 1)\t0.569...
1 (0, 0)\t-0.47129140614423404\n (0, 1)\t0.13...
2 (0, 0)\t0.6391234497799465\n (0, 1)\t0.2931...
3 (0, 0)\t-2.0106536281536562\n (0, 1)\t-0.92...
4 (0, 0)\t0.9782971304731922\n (0, 1)\t0.6534...
... ...
18899 (0, 0)\t0.7572819165580632\n (0, 1)\t-0.354...
18900 (0, 0)\t-0.3687666927075019\n (0, 1)\t-0.88...
18901 (0, 0)\t-0.7313605840625186\n (0, 1)\t1.146...
18902 (0, 0)\t0.5782862084049006\n (0, 1)\t1.3732...
18903 (0, 0)\t0.4332583276430423\n (0, 1)\t-0.555...
18904 rows × 1 columns
谁能告诉我如何解决它?感谢帮助
【问题讨论】:
-
您的管道在您提供的数据上运行良好。检查数据框特征的数据类型。
标签: python pandas scikit-learn pipeline