Matlab 的等效 R 代码：如何计算线性模型的残差值？答案

【问题标题】：Equivalent R code for Matlab: How to calculate residual values for the linear model?Matlab 的等效 R 代码：如何计算线性模型的残差值？
【发布时间】：2017-02-14 17:48:26
【问题描述】：

我有一个需要翻译成 Matlab 的 r 代码，如下所示：

xt =  c(-0.227, -0.604,  0.974,  2.639, -0.271, -0.355, -0.551,  0.342,  2.390, -1.257)
sets = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
methods = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
wt = c( 1, 1, 1, 1, 1, 3, 3, 3, 3, 3)

sets = as.factor(sets)
methods = as.factor(methods)
lm1 <- lm(xt ~ sets + methods, weights = wt)

我需要线性模型的残差值，即。

lm1$residual

polyfit 函数不排除权重！ Matlab 中的什么函数会给我线性模型的残差值？

【问题讨论】：

那么你想要加权线性回归的残差吗？你有统计工具箱吗？因为fitlm，现在是基本的线性回归工具，将权重作为可选参数，并且肯定会输出残差。你有什么工具箱，什么版本的 Matlab？

标签： r matlab

【解决方案1】：

我假设您在 MATLAB 中有统计工具箱。如果你不这样做，那么这将不起作用。

MATLAB 中的等效代码与R 几乎相同。您所要做的就是设置一个包含变量的数据框，然后使用fitlm 或LinearModel.fit 来拟合您的线性模型。 fitlm 是 LinearModel.fit 的更新版本，可从 R2013b 及更高版本开始使用。如果您的 MATLAB 版本高于此，建议您使用 fitlm。如果您不这样做，请使用LinearModel.fit。 lm in R 将线性模型拟合到您的预测变量和输出，而 fitlm / LinearModel.fit 在 MATLAB 中做同样的事情。

您要做的就是像上面那样定义变量，但请确保使用 MATLAB 中的 dataset 函数将它们封装在数据框中。之后，使用 MATLAB 中的 nominal 函数创建因子变量。然后，您创建线性模型，但指定一个附加标志 Weights 以使用您的 wt 变量对每个预测变量和输出组合进行加权。创建线性模型后，您只需通过Residuals 访问残差字段。您可以在 R（又名Wilkinson notation）中以相同的方式定义预测变量和输出变量之间的输入/输出关系。

我需要指出的一点是，您必须确保您的数据在列中，而不是在行中。您会看到我将数据放入其中，但使用转置运算符确保数据在列中。因此：

% // Define data
xt = [-0.227, -0.604, 0.974, 2.639, -0.271, -0.355, -0.551, 0.342, 2.390, -1.257].';
sets = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5].';
methods = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2].';
wt = [1, 1, 1, 1, 1, 3, 3, 3, 3, 3].';

%// Create data frame and make categorical data
data = dataset(xt, sets, methods);
data.sets = nominal(data.sets);
data.methods = nominal(data.methods);

%// Create linear model and specify weights
fit = LinearModel.fit(data, 'xt ~ sets + methods', 'Weights', wt);
%// or 
%// fit = fitlm(data, 'xt ~ sets + methods', 'Weights', wt);

%// Access residuals
res = fit.Residuals;

这是我得到的线性模型：

fit = 


Linear regression model:
    xt ~ 1 + sets + methods

Estimated Coefficients:
                   Estimate    SE         tStat       pValue    
    (Intercept)     -0.0317    0.22889    -0.13849       0.89654
    sets_2         -0.24125    0.25591    -0.94273        0.3992
    sets_3            0.823    0.25591       3.216      0.032403
    sets_4           2.7752    0.25591      10.845    0.00041025
    sets_5          -0.6875    0.25591     -2.6865      0.054855
    methods_2       -0.3884    0.18689     -2.0783       0.10623

Number of observations: 10, Error degrees of freedom: 4
Root Mean Squared Error: 0.362
R-squared: 0.983,  Adjusted R-Squared 0.962
F-statistic vs. constant model: 46.6, p-value = 0.00123

这些是我得到的残差：

res = 

    Raw         Pearson     Studentized    Standardized
     -0.1953    -0.53964    -0.64365       -0.69667    
    -0.33105    -0.91474     -1.2672        -1.1809    
      0.1827     0.50483       0.597        0.65173    
    -0.10455    -0.28889    -0.32875       -0.37295    
      0.4482      1.2384      2.3047         1.5988    
      0.0651     0.17988     0.37161        0.40223    
     0.11035     0.30491     0.73161        0.68181    
     -0.0609    -0.16828    -0.34468       -0.37628    
     0.03485    0.096296      0.1898        0.21532    
     -0.1494    -0.41281     -1.3306       -0.92308

为了独立起见，这是我从R 中的代码中得到的，我们应该看到输出或多或少相同：

> summary(lm1)

lm(formula = xt ~ sets + methods, weights = wt)

Weighted Residuals:
       1        2        3        4        5        6        7        8        9       10 
-0.19530 -0.33105  0.18270 -0.10455  0.44820  0.11276  0.19113 -0.10548  0.06036 -0.25877 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.0317     0.2289  -0.138  0.89654    
sets2        -0.2412     0.2559  -0.943  0.39920    
sets3         0.8230     0.2559   3.216  0.03240 *  
sets4         2.7753     0.2559  10.845  0.00041 ***
sets5        -0.6875     0.2559  -2.687  0.05486 .  
methods2     -0.3884     0.1869  -2.078  0.10623    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3619 on 4 degrees of freedom
Multiple R-squared:  0.9831,    Adjusted R-squared:  0.962 
F-statistic: 46.58 on 5 and 4 DF,  p-value: 0.001226

> lm1$residuals

       1        2        3        4        5        6        7        8        9       10 
-0.19530 -0.33105  0.18270 -0.10455  0.44820  0.06510  0.11035 -0.06090  0.03485 -0.14940

R 显示原始残差，这对应于 MATLAB 中 Residuals 矩阵的第一列。请注意，残差仍然封装在数据框中（dataset 类）。如果要提取数值，可以使用dataset2struct 将数据集的每一列转换为结构中的字段。这样，您只需使用点符号访问每一列。

如果使用LinearModel.fit，则残差数据框以dataset 类型返回。但是，如果您使用fitlm，则输出实际上是table。在这种情况下，您需要使用table2struct 将残差转换为具有相关字段的结构。

换句话说，你会做这样的事情：

resMatrix = dataset2struct(res); %// If using LinearModel.fit
%// or
%// resMatrix = table2struct(res); %// If using fitlm

这是我得到的：

resMatrix = 

10x1 struct array with fields:

    Raw
    Pearson
    Studentized
    Standardized

然后您可以通过以下方式访问每一列：

raw = resMatrix.Raw;
pear = resMatrix.Pearson;
stu = resMatrix.Studentized;
sta = resMatrix.Standardized;

如果您想提取原始二维矩阵（即resMatrix = double(res)），您也可以将输出转换为double。如果你这样做，这就是你得到的：

resMatrix = double(res)

resMatrix =

   -0.1953   -0.5396   -0.6437   -0.6967
   -0.3311   -0.9147   -1.2672   -1.1809
    0.1827    0.5048    0.5970    0.6517
   -0.1046   -0.2889   -0.3288   -0.3730
    0.4482    1.2384    2.3047    1.5988
    0.0651    0.1799    0.3716    0.4022
    0.1103    0.3049    0.7316    0.6818
   -0.0609   -0.1683   -0.3447   -0.3763
    0.0349    0.0963    0.1898    0.2153
   -0.1494   -0.4128   -1.3306   -0.9231

现在这是一个实际的 2D 矩阵，您可以在其中访问各个元素，并且可以对您的内容执行切片操作、过滤操作等。在你的情况下，你需要原始残差，所以你会这样做raw = resMatrix(:,1);

【讨论】：

对于“表”类型的输入参数，我得到“未定义函数 'dataset2struct'。”错误，当我运行 resMatrix = dataset2struct(res);
@user2333346 - 这是因为您将table 用于dataset2struct。改用table2struct：mathworks.com/help/matlab/ref/table2struct.html
我不确定如何更改它，但您的意思是数据集高于数据集！对吗？
@user2333346 - 哎呀。你说的对。让我为你改变它！上面的代码有效吗？希望如此，因为您接受了我的回答:)
@user2333346 - 哈哈哈我也希望如此 :) 谢谢你的感情，祝你好运！