【发布时间】:2018-08-28 08:34:48
【问题描述】:
我有一个涉及贷款违约信息的数据集,并且正在尝试构建一个神经网络来预测违约。构建神经网络如下所示:
form <- as.formula(paste("loan_status_fixed ~", paste(n[!n %in% "use"], collapse = " + ")))
表单输出为:
loan_status_fixed ~ addr_stateAK + addr_stateAL + addr_stateAR +
addr_stateAZ + addr_stateCA + addr_stateCO + addr_stateCT +
addr_stateDC + addr_stateDE + addr_stateFL + addr_stateGA +
addr_stateHI + addr_stateIA + addr_stateID + addr_stateIL +
addr_stateIN + addr_stateKS + addr_stateKY + addr_stateLA +
addr_stateMA + addr_stateMD + addr_stateME + addr_stateMI +
addr_stateMN + addr_stateMO + addr_stateNH + addr_stateNJ +
addr_stateNM + addr_stateNV + addr_stateNY + addr_stateOH +
addr_stateOK + addr_stateOR + addr_statePA + addr_stateRI +
addr_stateSC + addr_stateSD + addr_stateTN + addr_stateTX +
addr_stateUT + addr_stateVA + addr_stateVT + addr_stateWA +
addr_stateWI + addr_stateWV + annual_inc + collections_12_mths_ex_med +
delinq_2yrs + dti + `emp_length1 year` + `emp_length2 years` +
`emp_length3 years` + `emp_length4 years` + `emp_length5 years` +
`emp_length6 years` + `emp_length7 years` + `emp_length8 years` +
`emp_length9 years` + `emp_length10+ years` + `emp_lengthn/a` +
fico_averaged + funded_amnt + sub_gradeA1 + sub_gradeA2 +
sub_gradeA3 + sub_gradeA4 + sub_gradeA5 + sub_gradeB1 + sub_gradeB2 +
sub_gradeB3 + sub_gradeB4 + sub_gradeB5 + sub_gradeC1 + sub_gradeC2 +
sub_gradeC3 + sub_gradeC4 + sub_gradeC5 + sub_gradeD1 + sub_gradeD2 +
sub_gradeD3 + sub_gradeD4 + sub_gradeD5 + sub_gradeE1 + sub_gradeE2 +
sub_gradeE3 + sub_gradeE4 + home_ownershipMORTGAGE + home_ownershipOWN +
open_acc + pub_rec + purposecar + purposecredit_card + purposedebt_consolidation +
purposeeducational + purposehome_improvement + purposehouse +
purposemajor_purchase + purposemedical + purposemoving +
purposeother + purposesmall_business + purposevacation +
revol_util
fit <- neuralnet(form, data = train,linear.output=FALSE)
该功能有效,但是当我尝试根据它运行预测时:
results <- neuralnet::compute(fit, test)
Error in neurons[[i]] %*% weights[[i]] : non-conformable arguments
先前关于此状态的问题是由于字符或因子变量而发生此结果,但是我的数据仅包含数字、整数和双精度数据类型。之前的其他建议是数据集只能包含计算中不包含的列,但是我已经对此进行了更正,并且训练和测试数据集中的所有列都包含在计算中。
下面是训练数据集的str。
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 654046 obs. of 104 variables:
$ loan_status_fixed : int 0 0 0 0 1 1 0 1 0 0 ...
$ addr_stateAK : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateAL : int 1 0 0 0 0 0 0 0 0 0 ...
$ addr_stateAR : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateAZ : int 0 0 0 0 0 0 0 0 1 0 ...
$ addr_stateCA : int 0 0 0 0 0 0 1 0 0 0 ...
$ addr_stateCO : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateCT : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateDC : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateDE : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateFL : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateGA : int 0 0 0 0 1 0 0 0 0 0 ...
$ addr_stateHI : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateIA : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateID : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateIL : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateIN : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateKS : int 0 0 0 0 0 0 0 0 0 1 ...
$ addr_stateKY : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateLA : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateMA : int 0 0 1 0 0 0 0 0 0 0 ...
$ addr_stateMD : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateME : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateMI : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateMN : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateMO : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateNH : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateNJ : int 0 0 0 1 0 0 0 0 0 0 ...
$ addr_stateNM : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateNV : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateNY : int 0 1 0 0 0 0 0 1 0 0 ...
$ addr_stateOH : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateOK : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateOR : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_statePA : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateRI : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateSC : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateSD : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateTN : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateTX : int 0 0 0 0 0 1 0 0 0 0 ...
$ addr_stateUT : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateVA : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateVT : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateWA : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateWI : int 0 0 0 0 0 0 0 0 0 0 ...
$ addr_stateWV : int 0 0 0 0 0 0 0 0 0 0 ...
$ annual_inc : num 58000 175000 66500 94800 64000 70000 95000 57000 67500 40000 ...
$ collections_12_mths_ex_med: int 0 0 0 0 0 0 0 0 0 0 ...
$ delinq_2yrs : int 0 0 0 1 0 0 0 0 0 2 ...
$ dti : num 28.7 14.1 13.7 14.5 26.1 ...
$ emp_length1 year : int 0 0 1 0 0 0 0 0 0 1 ...
$ emp_length2 years : int 0 0 0 0 0 0 0 0 1 0 ...
$ emp_length3 years : int 0 0 0 0 0 0 0 0 0 0 ...
$ emp_length4 years : int 0 0 0 0 1 0 0 0 0 0 ...
$ emp_length5 years : int 0 0 0 1 0 0 0 0 0 0 ...
$ emp_length6 years : int 0 0 0 0 0 0 0 0 0 0 ...
$ emp_length7 years : int 0 0 0 0 0 0 0 0 0 0 ...
$ emp_length8 years : int 0 0 0 0 0 0 0 0 0 0 ...
$ emp_length9 years : int 0 0 0 0 0 0 0 0 0 0 ...
$ emp_length10+ years : int 1 0 0 0 0 1 1 1 0 0 ...
$ emp_lengthn/a : int 0 0 0 0 0 0 0 0 0 0 ...
$ fico_averaged : int 712 722 777 677 727 757 687 687 677 687 ...
$ funded_amnt : int 17000 25000 8000 20000 29425 22000 11600 16000 26575 18000 ...
$ sub_gradeA1 : int 0 0 1 0 0 0 0 0 0 0 ...
$ sub_gradeA2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeA3 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeA4 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeA5 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeB1 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeB2 : int 0 1 0 0 0 0 0 0 0 0 ...
$ sub_gradeB3 : int 0 0 0 0 0 1 0 0 0 0 ...
$ sub_gradeB4 : int 1 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeB5 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeC1 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeC2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeC3 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeC4 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeC5 : int 0 0 0 1 0 0 0 0 0 0 ...
$ sub_gradeD1 : int 0 0 0 0 0 0 1 0 0 1 ...
$ sub_gradeD2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeD3 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeD4 : int 0 0 0 0 0 0 0 0 1 0 ...
$ sub_gradeD5 : int 0 0 0 0 0 0 0 1 0 0 ...
$ sub_gradeE1 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeE2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ sub_gradeE3 : int 0 0 0 0 1 0 0 0 0 0 ...
$ sub_gradeE4 : int 0 0 0 0 0 0 0 0 0 0 ...
$ home_ownershipMORTGAGE : int 1 0 1 1 0 1 0 0 1 1 ...
$ home_ownershipOWN : int 0 0 0 0 0 0 0 0 0 0 ...
$ open_acc : int 14 11 16 5 14 6 5 10 9 16 ...
$ pub_rec : int 0 0 0 0 0 0 0 0 0 0 ...
$ purposecar : int 0 0 0 0 0 0 0 0 0 0 ...
$ purposecredit_card : int 0 0 1 0 0 0 0 0 1 0 ...
$ purposedebt_consolidation : int 1 1 0 1 1 1 1 1 0 1 ...
$ purposeeducational : int 0 0 0 0 0 0 0 0 0 0 ...
$ purposehome_improvement : int 0 0 0 0 0 0 0 0 0 0 ...
$ purposehouse : int 0 0 0 0 0 0 0 0 0 0 ...
$ purposemajor_purchase : int 0 0 0 0 0 0 0 0 0 0 ...
$ purposemedical : int 0 0 0 0 0 0 0 0 0 0 ...
$ purposemoving : int 0 0 0 0 0 0 0 0 0 0 ...
$ purposeother : int 0 0 0 0 0 0 0 0 0 0 ...
$ purposesmall_business : int 0 0 0 0 0 0 0 0 0 0 ...
$ purposevacation : int 0 0 0 0 0 0 0 0 0 0 ...
$ revol_util : num 45.1 50.1 29.7 93.4 66 0 96.5 68.2 88.4 28.6 ...
【问题讨论】:
-
使用 dput(head(train)) 代替 str() 来获取某人可以用来帮助您的对象。列的子集是否会发生此错误?
-
test数据必须只包含自变量。乍一看,你至少需要删除依赖变量loan_status_fixed。
标签: r neural-network