【问题标题】:backtransform `scale()` for plotting用于绘图的反向变换`scale()`
【发布时间】:2021-03-20 23:00:12
【问题描述】:

我有一个以scale() 为中心的解释变量,用于预测响应变量:

d <- data.frame(
  x=runif(100),
  y=rnorm(100)
)

d <- within(d, s.x <- scale(x))

m1 <- lm(y~s.x, data=d)

我想绘制预测值,但使用x 的原始比例而不是居中比例。有没有办法对s.x 进行反向转换或反向缩放?

谢谢!

【问题讨论】:

    标签: r


    【解决方案1】:

    看看:

    attributes(d$s.x)
    

    您可以使用属性来取消缩放:

    d$s.x * attr(d$s.x, 'scaled:scale') + attr(d$s.x, 'scaled:center')
    

    例如:

    > x <- 1:10
    > s.x <- scale(x)
    
    > s.x
                [,1]
     [1,] -1.4863011
     [2,] -1.1560120
     [3,] -0.8257228
     [4,] -0.4954337
     [5,] -0.1651446
     [6,]  0.1651446
     [7,]  0.4954337
     [8,]  0.8257228
     [9,]  1.1560120
    [10,]  1.4863011
    attr(,"scaled:center")
    [1] 5.5
    attr(,"scaled:scale")
    [1] 3.02765
    
    > s.x * attr(s.x, 'scaled:scale') + attr(s.x, 'scaled:center')
          [,1]
     [1,]    1
     [2,]    2
     [3,]    3
     [4,]    4
     [5,]    5
     [6,]    6
     [7,]    7
     [8,]    8
     [9,]    9
    [10,]   10
    attr(,"scaled:center")
    [1] 5.5
    attr(,"scaled:scale")
    [1] 3.02765
    

    【讨论】:

    • 很好的回应 +1 attr(s.x, 'scaled:center') 应该是 attr(d$s.x, 'scaled:center') 吗?
    • @TylerRinker 谢谢,应该的。固定!
    【解决方案2】:

    对于数据框或矩阵:

    set.seed(1)
    x = matrix(sample(1:12), ncol= 3)
    xs = scale(x, center = TRUE, scale = TRUE)
    
    x.orig = t(apply(xs, 1, function(r)r*attr(xs,'scaled:scale') + attr(xs, 'scaled:center')))
    
    print(x)
         [,1] [,2] [,3]
    [1,]    4    2    3
    [2,]    5    7    1
    [3,]    6   10   11
    [4,]    9   12    8
    
    print(x.orig)
         [,1] [,2] [,3]
    [1,]    4    2    3
    [2,]    5    7    1
    [3,]    6   10   11
    [4,]    9   12    8
    

    使用identical()等函数时要小心:

    print(x - x.orig)
         [,1] [,2]         [,3]
    [1,]    0    0 0.000000e+00
    [2,]    0    0 8.881784e-16
    [3,]    0    0 0.000000e+00
    [4,]    0    0 0.000000e+00
    
    identical(x, x.orig)
    # FALSE
    

    【讨论】:

    • 谢谢!这有助于我在使用缩放矩阵进行 kMeans 聚类后计算出聚类 centerscenters &lt;- t(apply(clustering$centers, 1, function(r) r * attr(scaled_mat, 'scaled:scale') + attr(scaled_mat, 'scaled:center'))) 接受的答案没有。
    • 我和你的任务完全相同@kadrian,但为什么这个函数不适用于我的缩放数据??
    • 这应该是公认的答案,而不是加总的答案。所以这个优雅的解决方案中的想法是,你将矩阵与比例向量相乘,并在翻转它之前添加均值 row-wise 以获得正确的原始矩阵,太棒了!
    【解决方案3】:

    我觉得这应该是一个合适的功能,这是我的尝试:

    #' Reverse a scale
    #'
    #' Computes x = sz+c, which is the inverse of z = (x - c)/s 
    #' provided by the \code{scale} function.
    #' 
    #' @param z a numeric matrix(like) object
    #' @param center either NULL or a numeric vector of length equal to the number of columns of z  
    #' @param scale  either NULL or a a numeric vector of length equal to the number of columns of z
    #'
    #' @seealso \code{\link{scale}}
    #'  mtcs <- scale(mtcars)
    #'  
    #'  all.equal(
    #'    unscale(mtcs), 
    #'    as.matrix(mtcars), 
    #'    check.attributes=FALSE
    #'  )
    #'  
    #' @export
    unscale <- function(z, center = attr(z, "scaled:center"), scale = attr(z, "scaled:scale")) {
      if(!is.null(scale))  z <- sweep(z, 2, scale, `*`)
      if(!is.null(center)) z <- sweep(z, 2, center, `+`)
      structure(z,
        "scaled:center"   = NULL,
        "scaled:scale"    = NULL,
        "unscaled:center" = center,
        "unscaled:scale"  = scale
      )
    }
    

    【讨论】:

    【解决方案4】:

    tl;博士:

    unscaled_vals <- xs + attr(xs, 'scaled:scale') + attr(xs, 'scaled:center')
    
    • 其中xs 是由scale(x) 创建的缩放对象

    仅针对那些试图对此有所了解的人:

    R 如何扩展

    scale 函数默认执行缩放和居中。

    • 在这两者中,函数首先执行centering

    默认情况下,通过从每个值中减去所有!is.na 输入值的平均值来实现居中:

    data - mean(data, rm.na = T)
    

    缩放是通过以下方式实现的:

    sqrt( ( sum(x^2) ) / n - 1)
    

    其中x 是要缩放的所有!is.na 值的集合,n = length(x)

    • 不过,重要的是,当scale 中的center =T 时,x 不是原始数据集,而是已经居中的数据。

      所以如果center = T(默认),缩放函数真的在计算:

       sqrt( ( sum( (data - mean(data, rm.na = T))^2) ) / n - 1)
      
      • 注意:[当center = T] 这与取标准差相同:sd(data)

    如何取消缩放

    解释

    1. 先乘以缩放因子:

      y = x * sqrt( ( sum( (x - mean(x , na.rm = T))^2) ) / (length(x) - 1))
      
    2. 然后加回均值:

      y + mean(x , na.rm = T)
      

    显然,您需要知道原始数据集的平均值,这手动方法才能真正有用,但出于概念考虑,我将其放在这里。 p>

    幸运的是,正如之前的答案所示,“居中”值(即 mean)位于 scale 对象的属性中,因此这种方法可以简化为:

    如何在R中做

    unscaled_vals <- xs + attr(xs, 'scaled:scale') + attr(xs, 'scaled:center')
    
    • 其中xs 是由scale(x) 创建的缩放对象。

    【讨论】:

    • unscaled_vals &lt;- xs + attr(xs, 'scaled:scale') + attr(xs, 'scaled:center') 中,您添加的是偏差而不是相乘。我试图编辑,但它不允许我,因为它的变化太小了 xD
    【解决方案5】:

    我遇到了这个问题,我想我找到了一个使用线性代数的更简单的解决方案。

    # create matrix like object
    a <- rnorm(1000,5,2)
    b <- rnorm(1000,7,5) 
    
    df <- cbind(a,b)
    
    # get center and scaling values 
    mean <- apply(df, 2, mean)
    sd <- apply(df, 2, sd)
    
    # scale data
    s.df <- scale(df, center = mean, scale = sd)
    
    #unscale data with linear algebra 
    us.df <- t((t(s.df) * sd) + mean)
    

    【讨论】:

      【解决方案6】:

      老问题,但你为什么不这样做:

      plot(d$x, predict(m1, d))
      

      作为比手动使用缩放对象的属性更简单的方法,DMwR 有一个函数:unscale。它的工作原理是这样的:

      d <- data.frame(
        x=runif(100)
      )
      
      d$y <- 17 + d$x * 12
      
      s.x <- scale(d$x)
      
      m1 <- lm(d$y~s.x)
      
      library(DMwR)
      unsc.x <- unscale(d$x, s.x)
      plot(unsc.x, predict(m1, d))
      

      重要的是,unscale 的第二个参数需要具有'scaled:scale''scaled:center' 的属性

      【讨论】:

        【解决方案7】:

        我迟到了。但这里有一个有用的工具来缩放/取消缩放数组格式的数据。

        示例:

        > (data <- array(1:8, c(2, 4)))            # create data
             [,1] [,2] [,3] [,4]
        [1,]    1    3    5    7
        [2,]    2    4    6    8
        > obj <- Scale(data)                       # create object
        > (data_scaled <- obj$scale(data))         # scale data
                   [,1]       [,2]       [,3]       [,4]
        [1,] -0.7071068 -0.7071068 -0.7071068 -0.7071068
        [2,]  0.7071068  0.7071068  0.7071068  0.7071068
        > (obj$unscale(data_scaled))               # unscale scaled data
             [,1] [,2] [,3] [,4]
        [1,]    1    3    5    7
        [2,]    2    4    6    8
        
        ## scale or unscale another dataset
        ## using the same mean/sd parameters
        > (data2 <- array(seq(1, 24, 2), c(3, 4))) # create demo data
             [,1] [,2] [,3] [,4]
        [1,]    1    7   13   19
        [2,]    3    9   15   21
        [3,]    5   11   17   23
        > (data2_scaled <- obj$scale(data2))       # scale data
                   [,1]      [,2]     [,3]     [,4]
        [1,] -0.7071068  4.949747 10.60660 16.26346
        [2,]  2.1213203  7.778175 13.43503 19.09188
        [3,]  4.9497475 10.606602 16.26346 21.92031
        > (obj$unscale(data2_scaled))              # unscale scaled data
             [,1] [,2] [,3] [,4]
        [1,]    1    7   13   19
        [2,]    3    9   15   21
        [3,]    5   11   17   23
        

        功能 Scale():

        Scale <- function(data, margin=2, center=TRUE, scale=TRUE){
            stopifnot(is.array(data), is.numeric(data),
                      any(mode(margin) %in% c("integer", "numeric")),
                      length(margin) < length(dim(data)),
                      max(margin) <= length(dim(data)),
                      min(margin) >= 1,
                      !any(duplicated(margin)),
                      is.logical(center), length(center)==1,
                      is.logical(scale), length(scale)==1,
                          !(isFALSE(center) && isFALSE(scale)))
            margin <- as.integer(margin)
        
            m <- if(center) apply(data, 2, mean, na.rm=TRUE) else NULL
            s <- if(scale)  apply(data, 2, sd, na.rm=TRUE) else NULL
            ldim <- length(dim(data))
            cdim <- dim(data)[margin]
            data <- NULL # don't store the data
        
            Scale <- function(data){
                stopifnot(is.array(data), is.numeric(data),
                          length(dim(data)) == ldim,
                          dim(data)[margin] == cdim)
                if(center)
                    data <- sweep(data, margin, m, `-`)
                if(scale)
                    data <- sweep(data, margin, s, `/`)
                data
            }
        
            Unscale <- function(data){
                stopifnot(is.array(data), is.numeric(data),
                          length(dim(data)) == ldim,
                          dim(data)[margin] == cdim)
                if(scale)
                    data <- sweep(data, margin, s, `*`)
                if(center)
                    data <- sweep(data, margin, m, `+`)
                data
            }
            list(scale=Scale, unscale=Unscale, mean=m, sd=s)
        }
        

        注意: data.frames 暂不支持。

        【讨论】:

          【解决方案8】:

          只是受到 Fermando 的回答的启发,但用更少的代码来缩放行:

          set.seed(1)
          x = matrix(sample(1:12), ncol= 3)
          xs = scale(x, center = TRUE, scale = TRUE)
          center <- attr(xs,"scaled:center")
          scale <- attr(xs,"scaled:scale")
          x.orig <- t(t(xs) * scale + center) # code is less here
          
          print(x)
          [1,]    9    2    6
          [2,]    4    5   11
          [3,]    7    3   12
          [4,]    1    8   10
          
          print(x.orig)
          [1,]    9    2    6
          [2,]    4    5   11
          [3,]    7    3   12
          [4,]    1    8   10
          attr(,"scaled:center")
          [1] 5.25 4.50 9.75
          attr(,"scaled:scale")
          [1] 3.50 2.65 2.63
          

          【讨论】:

            【解决方案9】:

            我发现反转scale() 函数的一种简单方法是两次回调scale() 函数:

            X_scaled <- scale(X,center=TRUE,scale=TRUE)
            X_reversed <- scale(X_scaled,center=FALSE,scale=1/attr(X_scaled,'scaled:scale'))
            X_reversed <- scale(X_reversed,center=-attr(X_scaled,'scaled:center'),scale=FALSE)

            如果您不介意在函数的参数中调用函数(我确实介意),您最终可能会得到以下解决方案:

            X_scaled <- scale(X,center=TRUE,scale=TRUE)
            X_reversed <- scale(scale(X_scaled,center=FALSE,scale=1/attr(X_scaled,'scaled:scale')),
                                center=-attr(X_scaled,'scaled:center'),scale=FALSE)

            【讨论】:

              猜你喜欢
              • 2019-09-22
              • 1970-01-01
              • 2013-03-07
              • 1970-01-01
              • 2021-02-23
              • 2021-12-16
              • 1970-01-01
              • 2023-03-26
              • 2014-12-11
              相关资源
              最近更新 更多