【问题标题】:Elementwise Median of 3 matrices in RR中3个矩阵的元素中位数
【发布时间】:2016-07-13 14:09:58
【问题描述】:

我有 3 个矩阵在每个矩阵中存储三次测量值(矩阵 1、测量值 1、矩阵 2 测量值 2,......)

它们具有以下结构:

> a1
            ACTIN       18S      TET1      TET2      TET3
Control 25.943441  22.62984      <NA> 34.063107 34.034756
Sample1  24.48504  20.04858      <NA>  32.37173 32.341072
Sample2 25.265867 19.680647 28.086248  33.76187  33.41289
Sample3 24.441484 18.146513      <NA> 32.811428  31.22825
> a2
            ACTIN       18S      TET1      TET2      TET3
Control 25.980696 22.393877      <NA> 34.548923   33.7815
Sample1 24.263775 20.073978  27.23082  32.27775 32.343292
Sample2  25.25487 19.680494 27.214449  33.70534  33.48968
Sample3  24.26332 18.108198      <NA> 32.769787  31.19895
> a3
            ACTIN       18S      TET1      TET2      TET3
Control 25.937397 22.429556 30.020935  33.98415 33.858604
Sample1  24.44776 20.090088 28.328804 32.317287 32.291912
Sample2 25.148333 19.537455      <NA>  33.83607   33.3961
Sample3 24.242998 18.335524      <NA> 32.788536 31.147346

我想用 3 个测量值的中位数创建一个新矩阵。 理想情况下,第一列保持不变。 如果没有值(未确定),则提供NA 是首选

我想要一个带有中位数的矩阵,所以是这样的:

median(a1[i,j], a2[i,j], a2[i,j])

我尝试了以下方法: 2 for 循环遍历数组:

med<-matrix(NA, nrow(a1), ncol(a1))    
for(i in ncol(a1)){
      for(j in nrow(a1)){
        med[i,j]<-median(a1[i,j], a2[i,j], a2[i,j])
      }
    }

但这给我的值显然不是中位数,我觉得它过于复杂。

谢谢!

【问题讨论】:

    标签: r matrix median


    【解决方案1】:

    您可以先将“Undetermined”替换为“NA”,您将自动获得NA。我不想输入所有这些数字,所以我只使用了 1 到 5,但它应该适用于任何数字。

    a1 <- data.frame(c("Control", "Sample1", "Sample2", "Sample3"), 1, 2, c("Undetermined", "Undetermined", 3, "Undetermined"), 4, 5) 
    a2 <- data.frame(c("Control", "Sample1", "Sample2", "Sample3"), 1, 2, c("Undetermined", 3, 3, "Undetermined"), 4, 5) 
    a3 <- data.frame(c("Control", "Sample1", "Sample2", "Sample3"), 1, 2, c(3, 3, "Undetermined", "Undetermined"), 4, 5) 
    names(a1) <- names(a2) <- names(a3) <- c("Sample", "CT ACTIN", "CT 18S", "CT TET1", "CT TET2", "CT TET3")
    a1[a1 == "Undetermined"] <- NA
    a2[a2 == "Undetermined"] <- NA
    a3[a3 == "Undetermined"] <- NA
    
    med <- matrix(NA, nrow = nrow(a1), ncol = ncol(a1))
    for (i in 1:nrow(a1)) {
      for (j in 1:ncol(a1)){
      med[i, j] <- median(c(a1[i, j], a2[i, j], a3[i, j]))
      }
    }
    
    med <- data.frame(a1[, 1], med)
    names(med) <- c("Sample", "CT ACTIN", "CT 18S", "CT TET1", "CT TET2", "CT TET3")
    

    【讨论】:

    • 请将您的代码发布为可重现的代码。你不是新人。你为什么会犯这种错误?
    • 用'NA'替换文本不是问题。我希望这个脚本适用于矩阵,无论未确定的位置在哪里,也不管矩阵的维度是什么(尽管它们都是相同的维度)
    • @user2100721 哪个部分不可重现?
    • 删除 +&gt;。不要粘贴控制台输出。
    【解决方案2】:

    您可以使用mapply 并重塑生成的矩阵。假设您的数据最初是我从&lt;NA&gt; 推断的字符矩阵,则可重现的解决方案如下:

    dat <- mapply(function(...) median(as.numeric(c(...))), a1, a2, a3)
    # this gives a warning message but you can ignore this which comes up when it converts the character `NA` to numeric `NA`;
    matrix(dat, nrow(a1), ncol(a1), dimnames = dimnames(a1))
    
    #            ACTIN     X18S TET1     TET2     TET3
    # Control 25.94344 22.42956   NA 34.06311 33.85860
    # Sample1 24.44776 20.07398   NA 32.31729 32.34107
    # Sample2 25.25487 19.68049   NA 33.76187 33.41289
    # Sample3 24.26332 18.14651   NA 32.78854 31.19895
    

    数据

    a1 <- structure(c("25.94344", "24.48504", "25.26587", "24.44148", "22.62984", 
    "20.04858", "19.68065", "18.14651", "<NA>", "<NA>", "28.086248", 
    "<NA>", "34.06311", "32.37173", "33.76187", "32.81143", "34.03476", 
    "32.34107", "33.41289", "31.22825"), .Dim = 4:5, .Dimnames = list(
        c("Control", "Sample1", "Sample2", "Sample3"), c("ACTIN", 
        "X18S", "TET1", "TET2", "TET3")))
    
    a2 <- structure(c("25.98070", "24.26377", "25.25487", "24.26332", "22.39388", 
    "20.07398", "19.68049", "18.10820", "<NA>", "27.23082", "27.214449", 
    "<NA>", "34.54892", "32.27775", "33.70534", "32.76979", "33.78150", 
    "32.34329", "33.48968", "31.19895"), .Dim = 4:5, .Dimnames = list(
        c("Control", "Sample1", "Sample2", "Sample3"), c("ACTIN", 
        "X18S", "TET1", "TET2", "TET3")))
    
    a3 <- structure(c("25.93740", "24.44776", "25.14833", "24.24300", "22.42956", 
    "20.09009", "19.53746", "18.33552", "30.020935", "28.328804", 
    "<NA>", "<NA>", "33.98415", "32.31729", "33.83607", "32.78854", 
    "33.85860", "32.29191", "33.39610", "31.14735"), .Dim = 4:5, .Dimnames = list(
        c("Control", "Sample1", "Sample2", "Sample3"), c("ACTIN", 
        "X18S", "TET1", "TET2", "TET3")))
    

    【讨论】:

    • 不知道它给了我什么,但它不是中位数:例如第一个值是 18.5,这是不可能的
    • 好吧matrix(mapply(function(...) median(as.numeric(c(...))), a1, a2, a3), nrow = nrow(a1), ncol = ncol(a1)) 处理@Patrick 的数据。此外,您的数据既有字符又有数字,它存储在一个只接受一种数据类型的矩阵中。而且很难通过您提供的格式复制您的数据。通常dput(a1) 等是一个不错的选择。在列名中保留空格是一种不好的做法。所有这些都使问题难以重现,人们不愿回答您的问题。
    • 谢谢,不知道发生了什么,但我得到了其他值:dat &lt;- mapply(function(...) median(as.numeric(c(...))), a1, a2, a3) matrix(dat, nrow(a1), ncol(a1), dimnames = dimnames(a1)) ACTIN 18S TET1 TET2 TET3 Control 18.5 6.5 6.5 NA 6.5 Sample1 6.5 18.5 6.5 6.5 NA Sample2 NA 6.5 18.5 6.5 6.5 Sample3 6.5 NA 6.5 18.5 6.5
    • 您确定您的数据与您粘贴和声明的一样吗?检查class(a1) 以查看它们是否实际上是矩阵,而不是数据框。
    • 如果是数据框,试试这个:do.call(cbind, Map(function(...) mapply(function(...) median(c(as.numeric(as.character(...)))), ...), a1, a2, a3)).
    【解决方案3】:

    假设您的数据集采用您在编辑之前发布它们的形式:

    > a1
    #    Sample CT ACTIN   CT 18S      CT TET1  CT TET2  CT TET3
    #1: Control 25.94344 22.62984 Undetermined 34.06311 34.03476
    #2: Sample1 24.48504 20.04858 Undetermined 32.37173 32.34107
    #3: Sample2 25.26587 19.68065    28.086248 33.76187 33.41289
    #4: Sample3 24.44148 18.14651 Undetermined 32.81143 31.22825
    

    您可以使用mget() 来检索与您环境中的a[[:digit:]] 匹配的对象,并将它们一起使用bind_rows()

    library(dplyr)
    dat <- bind_rows(mget(ls(pattern = "a[[:digit:]]")))
    

    然后使用na_if()"Undetermined" 替换为NA,将除Sample 之外的所有列转换为数字,并使用summarise_each() 计算median()

    dat %>%
      na_if("Undetermined") %>%
      mutate_each(funs(as.numeric), -Sample) %>%
      group_by(Sample) %>%
      summarise_each(funs(median(., na.rm = TRUE)), -Sample)
    

    这给出了:

    # A tibble: 4 x 6
    #   Sample CT ACTIN   CT 18S  CT TET1  CT TET2  CT TET3
    #    <chr>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
    #1 Control 25.94344 22.42956 30.02094 34.06311 33.85860
    #2 Sample1 24.44776 20.07398 27.77981 32.31729 32.34107
    #3 Sample2 25.25487 19.68049 27.65035 33.76187 33.41289
    #4 Sample3 24.26332 18.14651       NA 32.78854 31.19895
    

    【讨论】:

      猜你喜欢
      • 2011-09-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-01-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多