【问题标题】:Multiply data frame with weights from another data frame将数据帧与另一个数据帧的权重相乘
【发布时间】:2020-02-05 14:32:40
【问题描述】:

我有一个带有表达式值的数据框df,我在数据框Weights 中有权重。 对于df 中的每一列,我想将df 中的每一行与Weights 中具有相似行名的相应行相乘。

然后对于df 中的每一列,您将获得行的加权值。

请查看我的示例输出。

df

Gene              MMRF_1021    MMRF_1024   MMRF_1029   MMRF_1030    MMRF_1031
ENSG00000007062   0.05374547   0.01258559   0.0000000   1.2985088   0.37618693
ENSG00000012124   0.13436368   0.27688288   0.2780448   0.7158432   0.03271195

权重

   Gene                   Pre.BI       Pre.BII       Immature     Naive         Memory       Plasmacell
   ENSG00000007062        0.006368928  0.000000e+00  0.000000000  0.0000000000  0.000000000  0.000000000
   ENSG00000012124        0.000000000  0.000000e+00  0.000000000  0.0000000000  0.000000000 -0.009728154

出来:

 Sample    Gene            Pre.BI            Pre.BI   Immature     Naive         Memory       Plasmacell
 MMRF_1021 ENSG00000007062 0.000342301       0        0            0             0             0
 MMRF_1021 ENSG00000012124 0                 0        0            0             0            -0.001307111
 MMRF_1024 ENSG00000007062 8.015672e-05      0        0            0             0             0
 MMRF_1024 ENSG00000012124 0                 0        0            0             0            -0.002693559
 .....

输入df:

structure(list(MMRF_1021 = c(0.0537454710193116, 0.134363677548279
), MMRF_1024 = c(0.0125855939107651, 0.276882875966623), MMRF_1029 = c(0, 
0.278044754955015), MMRF_1030 = c(1.29850876031527, 0.715843203834688
), MMRF_1031 = c(0.37618693249153, 0.032711952160723)), row.names = c("ENSG00000007062", 
"ENSG00000012124"), class = "data.frame")

输入权重:

structure(list(Pre.BI = c(0.006368928, 0), Pre.BII = c(0, 0), 
    Immature = c(0, 0), Naive = c(0, 0), Memory = c(0, 0), Plasmacell = c(0, 
    -0.009728154)), row.names = c("ENSG00000007062", "ENSG00000012124"
), class = "data.frame")

【问题讨论】:

  • 您能否提供一个使用dput() 的可重现示例?
  • 添加了 df 和权重的输入

标签: r


【解决方案1】:

我想你可能正在寻找这个:

library(tidyverse)

joinedDataframe <- df %>%
    rownames_to_column("gene") %>%
    gather("sample", "value", -gene) %>%
    left_join(weights %>%
                  rownames_to_column("gene")
              , by = "gene")

joinedDataframe %>%
    mutate(Pre.BI = Pre.BI * value
           , Pre.BII = Pre.BII * value
           , Immature = Immature * value
           , Naive = Naive * value
           , Memory = Memory * value
           , Plasmacell = Plasmacell * value) %>%
    select(-value)

              gene    sample       Pre.BI Pre.BII Immature Naive Memory    Plasmacell
1  ENSG00000007062 MMRF_1021 3.423010e-04       0        0     0      0  0.0000000000
2  ENSG00000012124 MMRF_1021 0.000000e+00       0        0     0      0 -0.0013071105
3  ENSG00000007062 MMRF_1024 8.015674e-05       0        0     0      0  0.0000000000
4  ENSG00000012124 MMRF_1024 0.000000e+00       0        0     0      0 -0.0026935593
5  ENSG00000007062 MMRF_1029 0.000000e+00       0        0     0      0  0.0000000000
6  ENSG00000012124 MMRF_1029 0.000000e+00       0        0     0      0 -0.0027048622
7  ENSG00000007062 MMRF_1030 8.270109e-03       0        0     0      0  0.0000000000
8  ENSG00000012124 MMRF_1030 0.000000e+00       0        0     0      0 -0.0069638329
9  ENSG00000007062 MMRF_1031 2.395907e-03       0        0     0      0  0.0000000000
10 ENSG00000012124 MMRF_1031 0.000000e+00       0        0     0      0 -0.0003182269

【讨论】:

  • 我完全推荐,是的。这是一本免费的在线书籍,写得很好:r4ds.had.co.nz
  • 其实这不是我想要的。您只有“值”列,但我希望“权重”中的每一列都有一个新值。请看我的输出。
  • 你确定吗?你看到我更新了我的答案吗?
【解决方案2】:

看到您的预期结果,我认为以下是您所追求的。例如,Plasmacell 对应于 MMRF_1024 ENSG00000012124 是 -0.002693559 (0.27688288 * -0.009728154)。为了得到这个数字,我将两个数据帧都转换为长格式数据。然后,我加入了他们。到这个时候你有两列来处理乘法(即gene_value和value)。在此之后,我将数据转换为宽格式数据框。

librrary(tidyverse)

rownames_to_column(df) %>% 
pivot_longer(cols = -rowname, names_to = "gene", values_to = "gene_value") -> temp1

rownames_to_column(weights) %>% 
pivot_longer(cols = -rowname, names_to = "variable", values_to = "value") -> temp2

left_join(temp1, temp2, by = "rowname") %>% 
mutate(answer = gene_value * value) %>% 
pivot_wider(id_cols = rowname:gene, names_from = "variable", values_from = "answer")

   rowname         gene         Pre.BI Pre.BII Immature Naive Memory Plasmacell
   <chr>           <chr>         <dbl>   <dbl>    <dbl> <dbl>  <dbl>      <dbl>
 1 ENSG00000007062 MMRF_1021 0.000342        0        0     0      0   0       
 2 ENSG00000007062 MMRF_1024 0.0000802       0        0     0      0   0       
 3 ENSG00000007062 MMRF_1029 0               0        0     0      0   0       
 4 ENSG00000007062 MMRF_1030 0.00827         0        0     0      0   0       
 5 ENSG00000007062 MMRF_1031 0.00240         0        0     0      0   0       
 6 ENSG00000012124 MMRF_1021 0               0        0     0      0  -0.00131 
 7 ENSG00000012124 MMRF_1024 0               0        0     0      0  -0.00269 
 8 ENSG00000012124 MMRF_1029 0               0        0     0      0  -0.00270 
 9 ENSG00000012124 MMRF_1030 0               0        0     0      0  -0.00696 
10 ENSG00000012124 MMRF_1031 0               0        0     0      0  -0.000318

【讨论】:

    【解决方案3】:

    这是一个基本的 R 解决方案

    dfout <- do.call(rbind,
                     c(make.row.names = F,
                       lapply(seq(ncol(df)), 
                              function(k) cbind(Gene = rownames(df[k]), 
                                                Sample = names(df[k]), 
                                                df[,k]*weights[match(rownames(weights),rownames(df)),]))))
    

    这样

    > dfout
                  Gene    Sample       Pre.BI Pre.BII Immature Naive Memory    Plasmacell
    1  ENSG00000007062 MMRF_1021 3.423010e-04       0        0     0      0  0.0000000000
    2  ENSG00000012124 MMRF_1021 0.000000e+00       0        0     0      0 -0.0013071105
    3  ENSG00000007062 MMRF_1024 8.015674e-05       0        0     0      0  0.0000000000
    4  ENSG00000012124 MMRF_1024 0.000000e+00       0        0     0      0 -0.0026935593
    5  ENSG00000007062 MMRF_1029 0.000000e+00       0        0     0      0  0.0000000000
    6  ENSG00000012124 MMRF_1029 0.000000e+00       0        0     0      0 -0.0027048622
    7  ENSG00000007062 MMRF_1030 8.270109e-03       0        0     0      0  0.0000000000
    8  ENSG00000012124 MMRF_1030 0.000000e+00       0        0     0      0 -0.0069638329
    9  ENSG00000007062 MMRF_1031 2.395907e-03       0        0     0      0  0.0000000000
    10 ENSG00000012124 MMRF_1031 0.000000e+00       0        0     0      0 -0.0003182269
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-06-24
      • 2013-02-17
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多