错误是由于您没有在训练调用中包含trControl = fitControl。但是,这会给您带来另一个错误,这是由于 data$obs 和 data$pred 是因素 - 需要转换为给出 1 或 2 的数字,减去 1 得到所需的 0 和1
log.loss2 <- function(data, lev = NULL, model = NULL) {
data$pred <- as.numeric(data$pred)-1
data$obs <- as.numeric(data$obs)-1
logloss = -sum(data$obs*log(data$Y) + (1-data$obs)*log(1-data$Y))/length(data$obs)
names(logloss) <- c('LL')
logloss
}
fitControl <- trainControl(method="cv",number=1, classProbs = T, summaryFunction = log.loss2)
fit.nnet2 <- train(target ~., data = data,
method = "nnet", maxit = 500, metric = "LL" ,
tuneGrid = my.grid, verbose = T, trControl = fitControl,
maximize = FALSE)
#output
Neural Network
100 samples
2 predictor
2 classes: 'N', 'Y'
No pre-processing
Resampling: Cross-Validated (1 fold)
Summary of sample sizes: 0
Resampling results:
LL
0.6931472
Tuning parameter 'size' was held constant at a value of 2
Tuning parameter 'decay' was held constant at a value of 0.05
注意几点:
此损失函数仅适用于包含N/Y 作为类的数据,因为概率定义为data$Y,更好的方法是找到类的名称并使用它。此外,自 log(0) 以来截断概率值的良好做法不是一个好主意:
LogLoss <- function (data, lev = NULL, model = NULL)
{
obs <- data[, "obs"]
cls <- levels(obs) #find class names
probs <- data[, cls[2]] #use second class name
probs <- pmax(pmin(as.numeric(probs), 1 - 1e-15), 1e-15) #bound probability
logPreds <- log(probs)
log1Preds <- log(1 - probs)
real <- (as.numeric(data$obs) - 1)
out <- c(mean(real * logPreds + (1 - real) * log1Preds)) * -1
names(out) <- c("LogLoss")
out
}