【发布时间】:2021-07-14 10:54:45
【问题描述】:
我有 4 列,分别称为 Amplification、CNV.gain、Homozygous.Deletion.Frequency、Heterozygous.Deletion.Frequency。我想创建一个新列,如果这 4 列中的任何值是:
- 大于等于5小于等于10返回低:
- 大于 10 小于等于 20,返回中
- 大于20,返回高
最终表 (long_fused) 的示例如下所示:
| CNV.Gain | Amplification | Homozygous.Deletion.Frequency | Heterozygous.Deletion.Frequency | Threshold |
|---|---|---|---|---|
| 3 | 5 | 10 | 0 | Low |
| 0 | 0 | 11 | 8 | Medium |
| 7 | 16 | 25 | 0 | High |
到目前为止,我已经尝试了以下代码,虽然它似乎填写了“阈值”列,但这样做不正确。
library(dplyr)
long_fused <- long_fused %>%
mutate(Percent_sample_altered = case_when(
Amplification>=5 & Amplification < 10 & CNV.gain>=5 & CNV.gain < 10 | CNV.gain>=5 & CNV.gain<=10 & Homozygous.Deletion.Frequency>=5 & Homozygous.Deletion.Frequency<=10| Heterozygous.Deletion.Frequency>=5 & Heterozygous.Deletion.Frequency<=10 ~ 'Low',
Amplification>= 10 & Amplification<20 |CNV.gain>=10 & CNV.gain<20| Homozygous.Deletion.Frequency>= 10 & Homozygous.Deletion.Frequency<20 | Heterozygous.Deletion.Frequency>=10 & Heterozygous.Deletion.Frequency<20 ~ 'Medium',
Amplification>20 | CNV.gain >20 | Homozygous.Deletion.Frequency >20 | Heterozygous.Deletion.Frequency>20 ~ 'High'))
一如既往地感谢任何帮助!
dput 格式的数据
long_fused <-
structure(list(CNV.Gain = c(3L, 0L, 7L), Amplification = c(5L,
0L, 16L), Homozygous.Deletion.Frequency = c(10L, 11L, 25L),
Heterozygous.Deletion.Frequency = c(0L, 8L, 0L), Threshold =
c("Low", "Medium", "High")), class = "data.frame",
row.names = c(NA, -3L))
【问题讨论】:
-
你说 "if any of the values" 但所有行的值都在一个以上的范围内。在这些情况下选择返回哪个因子水平的规则是什么?
-
你好睿,所以根据4列中的最高值返回因子水平,即分别在第1,2和3行中分别为10,11和25
标签: r dataframe dplyr conditional-formatting