【发布时间】:2021-06-19 03:16:43
【问题描述】:
# Partition the data:
library(tidymodels)
set.seed(1234)
uni_split <- initial_split(suspicious_match, strata = truth)
uni_train <- training(uni_split)
uni_test <- testing(uni_split)
uni_split
## Build a model recipe :
library(themis)
uni_rec <- recipe(truth ~ lv + lcs + qgram + jaccard + jw + cosine , data = uni_train)%>%
step_normalize(all_numeric()) %>%
step_smote(truth, skip = FALSE)%>%
prep()
uni_rec
bake(uni_rec, new_data = uni_train)
我用多个模型训练了数据:(一个例子)
# Train Logistic Regression :
glm_spec <- logistic_reg()%>%
set_engine("glm")
glm_fit <- glm_spec %>%
fit(truth ~ lv + lcs + qgram + cosine + jaccard + jw , data= juice(uni_rec))
glm_fit
## Model evaluation with resampling :
set.seed(123)
folds <- vfold_cv(juice(uni_rec), strata = truth)
folds
#1: Logistic Reg:
set.seed(234)
glm_rs <- glm_spec%>%
fit_resamples(truth ~ lv + lcs + qgram + cosine + jaccard + jw, folds,
metrics = metric_set(roc_auc, sens, spec, accuracy),
control = control_resamples(save_pred = TRUE))
## Evaluation des modeles :
glm_rs %>% collect_metrics()
> glm_rs %>% collect_metrics()
# A tibble: 4 x 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 accuracy binary 0.851 10 0.00514 Preprocessor1_Model1
2 roc_auc binary 0.898 10 0.00390 Preprocessor1_Model1
3 sens binary 0.875 10 0.00695 Preprocessor1_Model1
4 spec binary 0.827 10 0.00700 Preprocessor1_Model1
但是当我尝试将逻辑回归模型应用于测试数据时,我得到了这个错误:
> glm_fit %>%
+ predict(new_data = bake(uni_rec, new_data = uni_test),
+ type = "prob")%>%
+ mutate(truth = uni_test$truth)%>%
+ roc_auc(truth, .pred_correct)
Erreur : Problem with `mutate()` input `truth`.
x Input `truth` can't be recycled to size 2022.
i Input `truth` is `uni_test$truth`.
i Input `truth` must be size 2022 or 1, not 1373.
Run `rlang::last_error()` to see where the error occurred.
我认为这是因为配方中的小步骤,但我不知道如何解决它 请帮忙!!
【问题讨论】:
-
您应该在 step_smote 中保留
skip = TRUE。这确保该步骤仅适用于训练数据集。通过将其设置为 FALSE,您可以在预测没有意义时对其进行上采样,因为您希望在整个预测过程中获得相同数量的观察结果 -
谢谢,我试过你的方法,这就是解决它的方法
标签: r machine-learning tidymodels