【问题标题】:R neuralnet package: Can't train neural networkR 神经网络包:无法训练神经网络
【发布时间】:2020-03-27 01:17:27
【问题描述】:

我正在尝试使用 neuralnet 包在此 data set 上训练模型。但是,我收到以下我无法理解的错误:

错误:错误导数包含 NA;验证导函数不除以 0(例如交叉熵)

这是我的代码:

library(neuralnet)
library(tidyverse)

framingham <- read_csv('https://courses.edx.org/assets/courseware/v1/7022cf016eefb6d3747447589423dab0/asset-v1:MITx+15.071x+3T2019+type@asset+block/framingham.csv',
                       col_types = cols(.default = 'i',sysBP = 'n', diaBP = 'n', BMI = 'n' ))
# Split data
set.seed(123); train_idx <- sample(nrow(framingham), 2/3 * nrow(framingham))
framingham_train <- framingham[train_idx, ]
framingham_test <- framingham[-train_idx, ]

# Binary classification
nn <- neuralnet(formula = TenYearCHD ~ ., data = framingham_train,
                hidden=c(3,2),
                act.fct = "tanh",
                stepmax = 1e8,
                err.fct = 'ce',
                linear.output = TRUE)

我已尝试更改错误函数和其他细节,但似乎没有任何效果。

【问题讨论】:

    标签: r machine-learning neural-network


    【解决方案1】:

    有些列带有 NA:

    summary(framingham)
          male             age          education     currentSmoker   
     Min.   :0.0000   Min.   :32.00   Min.   :1.000   Min.   :0.0000  
     1st Qu.:0.0000   1st Qu.:42.00   1st Qu.:1.000   1st Qu.:0.0000  
     Median :0.0000   Median :49.00   Median :2.000   Median :0.0000  
     Mean   :0.4292   Mean   :49.58   Mean   :1.979   Mean   :0.4941  
     3rd Qu.:1.0000   3rd Qu.:56.00   3rd Qu.:3.000   3rd Qu.:1.0000  
     Max.   :1.0000   Max.   :70.00   Max.   :4.000   Max.   :1.0000  
                                      NA's   :105                     
       cigsPerDay         BPMeds        prevalentStroke     prevalentHyp   
     Min.   : 0.000   Min.   :0.00000   Min.   :0.000000   Min.   :0.0000  
     1st Qu.: 0.000   1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.:0.0000  
     Median : 0.000   Median :0.00000   Median :0.000000   Median :0.0000  
     Mean   : 9.006   Mean   :0.02962   Mean   :0.005896   Mean   :0.3106  
     3rd Qu.:20.000   3rd Qu.:0.00000   3rd Qu.:0.000000   3rd Qu.:1.0000  
     Max.   :70.000   Max.   :1.00000   Max.   :1.000000   Max.   :1.0000  
     NA's   :29       NA's   :53 
    

    如果您对没有 NA 的行进行子集化,它应该可以工作:

    set.seed(123)
    framingham = framingham[complete.cases(framingham),]
    train_idx <- sample(nrow(framingham), 2/3 * nrow(framingham))
    framingham_train <- framingham[train_idx, ]
    framingham_test <- framingham[-train_idx, ]
    

    另外,我认为你不能使用 tanh 进行交叉熵激活,所以下面的东西可以使用 logistic 作为激活函数:

    nn <- neuralnet(formula = TenYearCHD ~ ., act.fct="logistic",rep = 3,
    data = framingham_train,hidden=c(3,2), err.fct = 'ce',linear.output=FALSE)
    

    【讨论】: