【问题标题】:Creating a column with factor variables conditional on multiple other columns?创建一个以多个其他列为条件的因子变量列?
【发布时间】:2021-07-14 10:54:45
【问题描述】:

我有 4 列,分别称为 Amplification、CNV.gain、Homozygous.Deletion.Frequency、Heterozygous.Deletion.Frequency。我想创建一个新列,如果这 4 列中的任何值是:

  • 大于等于5小于等于10返回低:
  • 大于 10 小于等于 20,返回中
  • 大于20,返回高

最终表 (long_fused) 的示例如下所示:

CNV.Gain Amplification Homozygous.Deletion.Frequency Heterozygous.Deletion.Frequency Threshold
3 5 10 0 Low
0 0 11 8 Medium
7 16 25 0 High

到目前为止,我已经尝试了以下代码,虽然它似乎填写了“阈值”列,但这样做不正确。

library(dplyr)
long_fused <- long_fused %>%
  mutate(Percent_sample_altered = case_when(
    Amplification>=5 & Amplification < 10 & CNV.gain>=5 & CNV.gain < 10 | CNV.gain>=5 & CNV.gain<=10 & Homozygous.Deletion.Frequency>=5 & Homozygous.Deletion.Frequency<=10| Heterozygous.Deletion.Frequency>=5 & Heterozygous.Deletion.Frequency<=10 ~ 'Low',
    Amplification>= 10 & Amplification<20 |CNV.gain>=10 & CNV.gain<20| Homozygous.Deletion.Frequency>= 10 & Homozygous.Deletion.Frequency<20 | Heterozygous.Deletion.Frequency>=10 & Heterozygous.Deletion.Frequency<20 ~ 'Medium', 
    Amplification>20 | CNV.gain >20 | Homozygous.Deletion.Frequency >20 | Heterozygous.Deletion.Frequency>20 ~ 'High'))

一如既往地感谢任何帮助!


dput 格式的数据

long_fused <-
structure(list(CNV.Gain = c(3L, 0L, 7L), Amplification = c(5L, 
0L, 16L), Homozygous.Deletion.Frequency = c(10L, 11L, 25L), 
Heterozygous.Deletion.Frequency = c(0L, 8L, 0L), Threshold = 
c("Low", "Medium", "High")), class = "data.frame", 
row.names = c(NA, -3L))

【问题讨论】:

  • 你说 "if any of the values" 但所有行的值都在一个以上的范围内。在这些情况下选择返回哪个因子水平的规则是什么?
  • 你好睿,所以根据4列中的最高值返回因子水平,即分别在第1,2和3行中分别为10,11和25

标签: r dataframe dplyr conditional-formatting


【解决方案1】:

这是一种使用rowwise 后跟基本函数cut 的方法。

library(dplyr)

long_fused %>%
  rowwise() %>%
  mutate(new = max(c_across(-Threshold)),
         new = cut(new, c(5, 10, 20, Inf), labels = c("Low", "Medium", "High"), left.open = TRUE))

【讨论】:

    【解决方案2】:

    这是使用case_when 的替代方法-

    library(dplyr)
    
    long_fused %>%
      mutate(max = do.call(pmax, select(., -Threshold)),
      #If you don't have Threshold column in your data just use .
      #mutate(max = do.call(pmax, .),  
             Threshold = case_when(between(max, 5, 10) ~ 'Low', 
                                   between(max, 11, 15) ~ 'Medium', 
                                   TRUE ~ 'High'))
    
    #  CNV.Gain Amplification Homozygous.Deletion.Frequency
    #1        3             5                            10
    #2        0             0                            11
    #3        7            16                            25
    
    #  Heterozygous.Deletion.Frequency max Threshold
    #1                               0  10       Low
    #2                               8  11    Medium
    #3                               0  25      High
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-08-24
      • 2017-08-12
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-10-15
      相关资源
      最近更新 更多