R中数据帧特定行的规范化答案

【问题标题】：Normalization on specific rows of a dataframe in RR中数据帧特定行的规范化
【发布时间】：2018-11-16 09:21:07
【问题描述】：

我有一个包含许多列和行的数据框。我想做以下事情：

在“id”列中取出所有不包含文本“phos”的行
在所有强度列上使用文本“int_sam”对这些行进行归一化（例如，中值居中）
使用上面计算的归一化因子/值，然后在 DO 包含在“id”列中的文本“phos”，以列方式（样本方式）的方式。

提前非常感谢您。我在 R 方面没有太多经验，我也不是统计学家。因此，可能使用 R 代码进行简单解释将非常有帮助。再次感谢。

int_sam_1 = c("2421432", "24242424", "NA", "4684757849", "NA", "10485040", "NA", 
          "6849400", "40300", "NA", "NA", "NA", "556456466", "4646456466", "246464266", "4564242646")
int_sam_2 = c("NA", "5342353", "14532556", "43566", "46367367", "768769769", "797899", "NA", "NA", "NA", 
          "686899", "7898979", "678568", "NA", "68886", "488")
int_sam_3 = c("11351", "NA", "NA", "NA", "1354151345", "1351351354", "314534", "1535", "3145354", "4353455", 
          "324535", "3543445", "34535", "34535534", "NA", "NA")
id = c("phos", "acet phos", "acet", "acet", "acet", "acet meth phos", "phos", "phos", "phos", "phos", "acet", 
   "meth", "meth phos", "phos", "meth phos", "phos")
df = cbind.data.frame(int_sam_1, int_sam_2, int_sam_3, id)

【问题讨论】：

你想规范化每一列吗？
谢谢。是的。对于不包含“phos”的行，它可以是全局的，这意味着所有样本的平均值/中位数都居中。

标签： r dataframe normalization

【解决方案1】：

尝试关注

你的数据

int_sam_1 = c(2421432, 24242424, NA, 4684757849, NA, 10485040, NA, 
              6849400, 40300, NA, NA, NA, 556456466, 4646456466, 246464266, 4564242646)
int_sam_2 = c(NA, 5342353, 14532556, 43566, 46367367, 768769769, 797899, NA, NA, NA, 
              686899, 7898979, 678568, NA, 68886, 488)
int_sam_3 = c(11351, NA, NA, NA, 1354151345, 1351351354, 314534, 1535, 3145354, 4353455, 
              324535, 3543445, 34535, 34535534, NA, NA)
id = c("phos", "acet phos", "acet", "acet", "acet", "acet meth phos", "phos", "phos", "phos", "phos", "acet", 
       "meth", "meth phos", "phos", "meth phos", "phos")
df = cbind.data.frame(int_sam_1, int_sam_2, int_sam_3, id)

对没有 phos 的列进行子集化并计算全局中位数

df.sub <- df %>% filter(!grepl("phos",id))
df.median <- median(as.vector(as.matrix(df.sub[,1:3])),na.rm = T)

从您拥有 phos 的第 1-3 列中的每个值中减去全局中位数

df <- df %>% 
mutate(int_sam_1=ifelse(grepl('phos',id),int_sam_1-df.median, int_sam_1)) %>% 
mutate(int_sam_2=ifelse(grepl('phos',id),int_sam_2-df.median, int_sam_2)) %>%
mutate(int_sam_3=ifelse(grepl('phos',id),int_sam_3-df.median, int_sam_3))

【讨论】：

很抱歉，您只需要没有 phos 的列的所有内容。相应地编辑filter(grepl("phos",id)) 到filter(!grepl("phos",id))