【发布时间】:2016-05-05 07:22:08
【问题描述】:
使用 Matlab 时,从交叉验证拟合中找到误差最小的模型的正确方法是什么?我的目标是显示最好的、交叉验证的决策树的错误率作为测试数据大小的函数,并具有以下代码:
chess = csvread(filename);
predictors = chess(:,1:6);
class = chess(:,7);
cvpart = cvpartition(class,'holdout', 0.3);
Xtrain = predictors(training(cvpart),:);
Ytrain = class(training(cvpart),:);
Xtest = predictors(test(cvpart),:);
Ytest = class(test(cvpart),:);
numElements = numel(training(cvpart));
trainErrorGrowing = zeros(numElements,1);
testErrorGrowing = zeros(numElements,1);
for n = 100:numElements
data = datasample(training(cvpart), n);
dataX = predictors(data,:);
dataY = class(data,:);
% Fit the decision tree
tree = fitctree(dataX, dataY, 'AlgorithmForCategorical', 'PullLeft', 'CrossVal', 'on');
% Loop to find the model with the least error
kfoldError = 100;
bestTree = tree.Trained{1};
for i = 1:10
err = loss(tree.Trained{i}, Xtrain, Ytrain);
if err < kfoldError
kfoldError = err;
bestTree = tree.Trained{i};
end
end
trainErrorGrowing(n) = loss(bestTree,Xtest,Ytest,'Subtrees','all'); % Training Error
testErrorGrowing(n) = loss(bestTree,Xtest,Ytest,'Subtrees','all'); % Testing Error
end
plot(numElements,testErrorGrowing);
重要的是,用于最终测试的数据不能以任何方式用于训练树。但是,当我尝试执行此代码时,出现错误
Error using classreg.learning.internal.classCount
You passed an unknown class '1' of type double.
上线
err = loss(tree.Trained{i}, Xtrain, Ytrain);
我尝试将迭代器转换为 int8 和 char,但两次都收到相同的错误。有没有一种更简单的方法可以找到错误最少的结果决策树,或者至少有一种方法可以引用单个经过训练的树?
【问题讨论】:
标签: matlab validation tree classification