用于绘图的反向变换`scale()`答案

【问题标题】：backtransform `scale()` for plotting用于绘图的反向变换`scale()`
【发布时间】：2021-03-20 23:00:12
【问题描述】：

我有一个以scale() 为中心的解释变量，用于预测响应变量：

d <- data.frame(
  x=runif(100),
  y=rnorm(100)
)

d <- within(d, s.x <- scale(x))

m1 <- lm(y~s.x, data=d)

我想绘制预测值，但使用x 的原始比例而不是居中比例。有没有办法对s.x 进行反向转换或反向缩放？

谢谢！

【问题讨论】：

标签： r

【解决方案1】：

看看：

attributes(d$s.x)

您可以使用属性来取消缩放：

d$s.x * attr(d$s.x, 'scaled:scale') + attr(d$s.x, 'scaled:center')

例如：

> x <- 1:10
> s.x <- scale(x)

> s.x
            [,1]
 [1,] -1.4863011
 [2,] -1.1560120
 [3,] -0.8257228
 [4,] -0.4954337
 [5,] -0.1651446
 [6,]  0.1651446
 [7,]  0.4954337
 [8,]  0.8257228
 [9,]  1.1560120
[10,]  1.4863011
attr(,"scaled:center")
[1] 5.5
attr(,"scaled:scale")
[1] 3.02765

> s.x * attr(s.x, 'scaled:scale') + attr(s.x, 'scaled:center')
      [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
 [5,]    5
 [6,]    6
 [7,]    7
 [8,]    8
 [9,]    9
[10,]   10
attr(,"scaled:center")
[1] 5.5
attr(,"scaled:scale")
[1] 3.02765

【讨论】：

很好的回应 +1 attr(s.x, 'scaled:center') 应该是 attr(d$s.x, 'scaled:center') 吗？
@TylerRinker 谢谢，应该的。固定！

【解决方案2】：

对于数据框或矩阵：

set.seed(1)
x = matrix(sample(1:12), ncol= 3)
xs = scale(x, center = TRUE, scale = TRUE)

x.orig = t(apply(xs, 1, function(r)r*attr(xs,'scaled:scale') + attr(xs, 'scaled:center')))

print(x)
     [,1] [,2] [,3]
[1,]    4    2    3
[2,]    5    7    1
[3,]    6   10   11
[4,]    9   12    8

print(x.orig)
     [,1] [,2] [,3]
[1,]    4    2    3
[2,]    5    7    1
[3,]    6   10   11
[4,]    9   12    8

使用identical()等函数时要小心：

print(x - x.orig)
     [,1] [,2]         [,3]
[1,]    0    0 0.000000e+00
[2,]    0    0 8.881784e-16
[3,]    0    0 0.000000e+00
[4,]    0    0 0.000000e+00

identical(x, x.orig)
# FALSE

【讨论】：

谢谢！这有助于我在使用缩放矩阵进行 kMeans 聚类后计算出聚类 centers。 centers <- t(apply(clustering$centers, 1, function(r) r * attr(scaled_mat, 'scaled:scale') + attr(scaled_mat, 'scaled:center'))) 接受的答案没有。
我和你的任务完全相同@kadrian，但为什么这个函数不适用于我的缩放数据？？
这应该是公认的答案，而不是加总的答案。所以这个优雅的解决方案中的想法是，你将矩阵与比例向量相乘，并在翻转它之前添加均值 row-wise 以获得正确的原始矩阵，太棒了！

【解决方案3】：

我觉得这应该是一个合适的功能，这是我的尝试：

#' Reverse a scale
#'
#' Computes x = sz+c, which is the inverse of z = (x - c)/s 
#' provided by the \code{scale} function.
#' 
#' @param z a numeric matrix(like) object
#' @param center either NULL or a numeric vector of length equal to the number of columns of z  
#' @param scale  either NULL or a a numeric vector of length equal to the number of columns of z
#'
#' @seealso \code{\link{scale}}
#'  mtcs <- scale(mtcars)
#'  
#'  all.equal(
#'    unscale(mtcs), 
#'    as.matrix(mtcars), 
#'    check.attributes=FALSE
#'  )
#'  
#' @export
unscale <- function(z, center = attr(z, "scaled:center"), scale = attr(z, "scaled:scale")) {
  if(!is.null(scale))  z <- sweep(z, 2, scale, `*`)
  if(!is.null(center)) z <- sweep(z, 2, center, `+`)
  structure(z,
    "scaled:center"   = NULL,
    "scaled:scale"    = NULL,
    "unscaled:center" = center,
    "unscaled:scale"  = scale
  )
}

【讨论】：

为了好玩，我已将此添加到stackoverflow 包的 github 版本中。
Luis torgo 也制作了一个用于缩放的函数（在 DMwR 中可用）：https://www.rdocumentation.org/packages/DMwR/versions/0.4.1/topics/unscale
DMwR 好像六年没更新了。
伟大的贡献，谢谢！想象一下，无需获取此代码或安装包即可撤消基本 R 函数 :)

【解决方案4】：

tl;博士：

unscaled_vals <- xs + attr(xs, 'scaled:scale') + attr(xs, 'scaled:center')

其中xs 是由scale(x) 创建的缩放对象

仅针对那些试图对此有所了解的人：

R 如何扩展：

scale 函数默认执行缩放和居中。

在这两者中，函数首先执行centering。

默认情况下，通过从每个值中减去所有!is.na 输入值的平均值来实现居中：

data - mean(data, rm.na = T)

缩放是通过以下方式实现的：

sqrt( ( sum(x^2) ) / n - 1)

其中x 是要缩放的所有!is.na 值的集合，n = length(x)。

不过，重要的是，当scale 中的center =T 时，x 不是原始数据集，而是已经居中的数据。

所以如果center = T（默认），缩放函数真的在计算：
```
 sqrt( ( sum( (data - mean(data, rm.na = T))^2) ) / n - 1)
```
- 注意：[当center = T] 这与取标准差相同：sd(data)。

如何取消缩放：

解释：

先乘以缩放因子：

y = x * sqrt( ( sum( (x - mean(x , na.rm = T))^2) ) / (length(x) - 1))

然后加回均值：
```
y + mean(x , na.rm = T)
```

显然，您需要知道原始数据集的平均值，这手动方法才能真正有用，但出于概念考虑，我将其放在这里。 p>

幸运的是，正如之前的答案所示，“居中”值（即 mean）位于 scale 对象的属性中，因此这种方法可以简化为：

如何在R中做：

unscaled_vals <- xs + attr(xs, 'scaled:scale') + attr(xs, 'scaled:center')

其中xs 是由scale(x) 创建的缩放对象。

【讨论】：

在unscaled_vals <- xs + attr(xs, 'scaled:scale') + attr(xs, 'scaled:center') 中，您添加的是偏差而不是相乘。我试图编辑，但它不允许我，因为它的变化太小了 xD

【解决方案5】：

我遇到了这个问题，我想我找到了一个使用线性代数的更简单的解决方案。

# create matrix like object
a <- rnorm(1000,5,2)
b <- rnorm(1000,7,5) 

df <- cbind(a,b)

# get center and scaling values 
mean <- apply(df, 2, mean)
sd <- apply(df, 2, sd)

# scale data
s.df <- scale(df, center = mean, scale = sd)

#unscale data with linear algebra 
us.df <- t((t(s.df) * sd) + mean)

【讨论】：

【解决方案6】：

老问题，但你为什么不这样做：

plot(d$x, predict(m1, d))

作为比手动使用缩放对象的属性更简单的方法，DMwR 有一个函数：unscale。它的工作原理是这样的：

d <- data.frame(
  x=runif(100)
)

d$y <- 17 + d$x * 12

s.x <- scale(d$x)

m1 <- lm(d$y~s.x)

library(DMwR)
unsc.x <- unscale(d$x, s.x)
plot(unsc.x, predict(m1, d))

重要的是，unscale 的第二个参数需要具有'scaled:scale' 和'scaled:center' 的属性

【讨论】：

【解决方案7】：

我迟到了。但这里有一个有用的工具来缩放/取消缩放数组格式的数据。

示例：

> (data <- array(1:8, c(2, 4)))            # create data
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8
> obj <- Scale(data)                       # create object
> (data_scaled <- obj$scale(data))         # scale data
           [,1]       [,2]       [,3]       [,4]
[1,] -0.7071068 -0.7071068 -0.7071068 -0.7071068
[2,]  0.7071068  0.7071068  0.7071068  0.7071068
> (obj$unscale(data_scaled))               # unscale scaled data
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8

## scale or unscale another dataset
## using the same mean/sd parameters
> (data2 <- array(seq(1, 24, 2), c(3, 4))) # create demo data
     [,1] [,2] [,3] [,4]
[1,]    1    7   13   19
[2,]    3    9   15   21
[3,]    5   11   17   23
> (data2_scaled <- obj$scale(data2))       # scale data
           [,1]      [,2]     [,3]     [,4]
[1,] -0.7071068  4.949747 10.60660 16.26346
[2,]  2.1213203  7.778175 13.43503 19.09188
[3,]  4.9497475 10.606602 16.26346 21.92031
> (obj$unscale(data2_scaled))              # unscale scaled data
     [,1] [,2] [,3] [,4]
[1,]    1    7   13   19
[2,]    3    9   15   21
[3,]    5   11   17   23

功能 Scale():

Scale <- function(data, margin=2, center=TRUE, scale=TRUE){
    stopifnot(is.array(data), is.numeric(data),
              any(mode(margin) %in% c("integer", "numeric")),
              length(margin) < length(dim(data)),
              max(margin) <= length(dim(data)),
              min(margin) >= 1,
              !any(duplicated(margin)),
              is.logical(center), length(center)==1,
              is.logical(scale), length(scale)==1,
                  !(isFALSE(center) && isFALSE(scale)))
    margin <- as.integer(margin)

    m <- if(center) apply(data, 2, mean, na.rm=TRUE) else NULL
    s <- if(scale)  apply(data, 2, sd, na.rm=TRUE) else NULL
    ldim <- length(dim(data))
    cdim <- dim(data)[margin]
    data <- NULL # don't store the data

    Scale <- function(data){
        stopifnot(is.array(data), is.numeric(data),
                  length(dim(data)) == ldim,
                  dim(data)[margin] == cdim)
        if(center)
            data <- sweep(data, margin, m, `-`)
        if(scale)
            data <- sweep(data, margin, s, `/`)
        data
    }

    Unscale <- function(data){
        stopifnot(is.array(data), is.numeric(data),
                  length(dim(data)) == ldim,
                  dim(data)[margin] == cdim)
        if(scale)
            data <- sweep(data, margin, s, `*`)
        if(center)
            data <- sweep(data, margin, m, `+`)
        data
    }
    list(scale=Scale, unscale=Unscale, mean=m, sd=s)
}

注意： data.frames 暂不支持。

【讨论】：

【解决方案8】：

只是受到 Fermando 的回答的启发，但用更少的代码来缩放行：

set.seed(1)
x = matrix(sample(1:12), ncol= 3)
xs = scale(x, center = TRUE, scale = TRUE)
center <- attr(xs,"scaled:center")
scale <- attr(xs,"scaled:scale")
x.orig <- t(t(xs) * scale + center) # code is less here

print(x)
[1,]    9    2    6
[2,]    4    5   11
[3,]    7    3   12
[4,]    1    8   10

print(x.orig)
[1,]    9    2    6
[2,]    4    5   11
[3,]    7    3   12
[4,]    1    8   10
attr(,"scaled:center")
[1] 5.25 4.50 9.75
attr(,"scaled:scale")
[1] 3.50 2.65 2.63

【讨论】：

【解决方案9】：

我发现反转scale() 函数的一种简单方法是两次回调scale() 函数：

X_scaled <- scale(X,center=TRUE,scale=TRUE)
X_reversed <- scale(X_scaled,center=FALSE,scale=1/attr(X_scaled,'scaled:scale'))
X_reversed <- scale(X_reversed,center=-attr(X_scaled,'scaled:center'),scale=FALSE)

如果您不介意在函数的参数中调用函数（我确实介意），您最终可能会得到以下解决方案：

X_scaled <- scale(X,center=TRUE,scale=TRUE)
X_reversed <- scale(scale(X_scaled,center=FALSE,scale=1/attr(X_scaled,'scaled:scale')),
                    center=-attr(X_scaled,'scaled:center'),scale=FALSE)

【讨论】：