【问题标题】:How can I replace zeros with half the minimum value within a column?如何用列中最小值的一半替换零?
【发布时间】:2020-05-14 01:02:55
【问题描述】:

我想用该列中大于零的最小值的一半替换数千行和列的数据框中的 0。我也不想包含前四列,因为它们是索引。

所以如果我从这样的事情开始:

index <- c("100p", "200p", 300p" 400p")
ratio <- c(5, 4, 3, 2)
gene <- c("gapdh", NA, NA,"actb"
species <- c("mouse", NA, NA, "rat")
a1 <- c(0,3,5,2)
b1 <- c(0, 0, 4, 6)
c1 <- c(1, 2, 3, 4)

as.data.frame(q) <- cbind(index, ratio, gene, species, a1, b1, c1)

index ratio gene  species a1 b1 c1
100p    5   gapdh mouse   0  0  1
200p    4    NA    NA     3  0  2
300p    3    NA    NA     5  4  3
400p    2   actb  rat     2  6  4

我希望得到这样的结果:

index ratio gene  species a1 b1 c1
100p    5   gapdh mouse   1  2  1
200p    4    NA    NA     3  2  2
300p    3    NA    NA     5  4  3
400p    2   actb  rat     2  6  4

我尝试了以下代码: apply(q[-4], 2, function(x) "[&lt;-"(x, x==0, min(x[x &gt; 0]) / 2))

但我不断收到错误消息:Error in min(x[x &gt; 0])/2 : non-numeric argument to binary operator

对此有任何帮助吗?非常感谢!

【问题讨论】:

    标签: r function replace min zero


    【解决方案1】:

    作为参考,考虑到您的原始代码,我相信您的功能不是问题。相反,错误来自将函数应用于非数字数据。

    # original data
    index <- c("100p", "200p", "300p" , "400p")
    ratio <- c(5, 4, 3, 2)
    gene <- c("gapdh", NA, NA,"actb")
    species <- c("mouse", NA, NA, "rat")
    a1 <- c(0,3,5,2)
    b1 <- c(0, 0, 4, 6)
    c1 <- c(1, 2, 3, 4)
    
    # data frame
    q <- as.data.frame(cbind(index, ratio, gene, species, a1, b1, c1))
    
    # examine structure (all cols are factors) 
    str(q)
    
    # convert factors to numeric  
    fac_to_num <- function(x){
      x <- as.numeric(as.character(x))
      x
    }
    
    # apply to cols 5 thru 7 only
    q[, 5:7] <- apply(q[, 5:7],2,fac_to_num)
    
    # examine structure  
    str(q)
    
    # use original function only on numeric data 
    apply(q[, 5:7], 2, function(x) "[<-"(x, x==0, min(x[x > 0]) / 2))
    

    【讨论】:

      【解决方案2】:

      稍微不同(对于大型数据集可能更快)dplyr 选项与一些数学可能是:

      q %>%
       mutate_at(vars(5:length(.)), ~ (. == 0) * min(.[. != 0])/2 + .)
      
        index ratio  gene species a1 b1 c1
      1  100p     5 gapdh   mouse  1  2  1
      2  200p     4  <NA>    <NA>  3  2  2
      3  300p     3  <NA>    <NA>  5  4  3
      4  400p     2  actb     rat  2  6  4
      

      base R 也一样:

      q[, 5:length(q)] <- lapply(q[, 5:length(q)], function(x) (x == 0) * min(x[x != 0])/2 + x)
      

      【讨论】:

        【解决方案3】:

        我们可以使用lapplyreplace 2 列中最小值为 0 的值。

        cols<- 5:7
        q[cols] <- lapply(q[cols], function(x) replace(x, x == 0, min(x[x>0], na.rm = TRUE)/2))
        
        q
        #  index ratio  gene species a1 b1 c1
        #1  100p     5 gapdh   mouse  1  2  1
        #2  200p     4  <NA>    <NA>  3  2  2
        #3  300p     3  <NA>    <NA>  5  4  3
        #4  400p     2  actb     rat  2  6  4
        

        dplyr中,我们可以使用mutate_at

        library(dplyr)
        q %>%  mutate_at(cols,~replace(., . == 0, min(.[.>0], na.rm = TRUE)/2))
        

        数据

        q <- structure(list(index = structure(1:4, .Label = c("100p", "200p", 
        "300p", "400p"), class = "factor"), ratio = c(5, 4, 3, 2), gene = structure(c(2L, 
        NA, NA, 1L), .Label = c("actb", "gapdh"), class = "factor"), 
        species = structure(c(1L, NA, NA, 2L), .Label = c("mouse", 
        "rat"), class = "factor"), a1 = c(0, 3, 5, 2), b1 = c(0, 
        0, 4, 6), c1 = c(1, 2, 3, 4)), class = "data.frame", row.names = c(NA, -4L))
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2022-12-03
          • 2021-03-15
          • 1970-01-01
          • 1970-01-01
          • 2020-03-25
          相关资源
          最近更新 更多