【发布时间】:2019-02-17 15:45:57
【问题描述】:
我想了解 H2o R-package 中h2o.predict() 函数的值(结果)的含义。我意识到在某些情况下,当predict 列是1 时,p1 列的值低于p0 列。我对p0 和p1 列的解释是指每个事件的概率,所以我预计当predict=1 时p1 的概率应该高于相反事件的概率(p0),但它并不总是发生,如下例所示:使用prostate dataset。
这里是可执行的例子:
library(h2o)
h2o.init(max_mem_size = "12g", nthreads = -1)
prostate.hex <- h2o.importFile("https://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")
prostate.hex$CAPSULE <- as.factor(prostate.hex$CAPSULE)
prostate.hex$RACE <- as.factor(prostate.hex$RACE)
prostate.hex$DCAPS <- as.factor(prostate.hex$DCAPS)
prostate.hex$DPROS <- as.factor(prostate.hex$DPROS)
prostate.hex.split = h2o.splitFrame(data = prostate.hex,
ratios = c(0.70, 0.20, 0.10), seed = 1234)
train.hex <- prostate.hex.split[[1]]
validate.hex <- prostate.hex.split[[2]]
test.hex <- prostate.hex.split[[3]]
fit <- h2o.glm(y = "CAPSULE", x = c("AGE", "RACE", "PSA", "DCAPS"),
training_frame = train.hex,
validation_frame = validate.hex,
family = "binomial", nfolds = 0, alpha = 0.5)
prostate.predict = h2o.predict(object = fit, newdata = test.hex)
result <- as.data.frame(prostate.predict)
subset(result, predict == 1 & p1 < 0.4)
subset 函数的结果得到以下输出:
predict p0 p1
11 1 0.6355974 0.3644026
17 1 0.6153021 0.3846979
23 1 0.6289063 0.3710937
25 1 0.6007919 0.3992081
31 1 0.6239587 0.3760413
对于来自test.hex 数据集的所有上述观察,预测为1,但p0 > p1。
predict=1 但p1 < p0 的总观察结果是:
> nrow(subset(result, predict == 1 & p1 < p0))
[1] 14
相反,没有predict=0 where p0 < p1
> nrow(subset(result, predict == 0 & p0 < p1))
[1] 0
这是tablepredict 的信息表:
> table(result$predict)
0 1
18 23
我们使用具有以下值的决策变量CAPSULE:
> levels(as.data.frame(prostate.hex)$CAPSULE)
[1] "0" "1"
有什么建议吗?
注意:与主题相似的问题:How to interpret results of h2o.predict 未解决此特定问题。
【问题讨论】:
标签: r machine-learning deep-learning h2o glm