【问题标题】:How to apply a function on every row of a data frame?如何对数据框的每一行应用函数?
【发布时间】:2019-10-25 23:47:02
【问题描述】:

我想使用这个包 metap 计算多个o值

我的数据框有 3 个 p 值

    > dput(head(tt))
structure(list(RS = c("rs2089177", "rs4360974", "rs6502526", 
"rs8069906", "rs9905280", "rs4313843"), G = c(0.9986, 0.9738, 
0.9744, 0.7184, 0.7205, 0.9804), E = c(0.7153, 0.7838, 0.7839, 
0.4918, 0.4861, 0.8522), B = c(0.604716, 0.430228, 0.42916, 0.521452, 
0.465758, 0.474313)), class = c("data.table", "data.frame"), row.names  = c(NA, 
-6L), .internal.selfref = <pointer: 0x10200eee0>)

和每个 p 值具有相应权重的数据框 来自 tt 数据框

   > dput(head(df))
structure(list(wg = c(40.6324993078201, 40.6324993078201, 40.6324993078201, 
 40.6324993078201, 40.6324993078201, 40.6324993078201), we = c(35.3977400408557, 
35.3977400408557, 35.3977400408557, 35.3977400408557, 35.3977400408557, 
35.3977400408557), wb = c(580.643608420863, 580.643608420863, 
580.643608420863, 580.643608420863, 580.643608420863, 580.643608420863
), RS = c("rs2089177", "rs4360974", "rs6502526", "rs8069906", 
"rs9905280", "rs4313843")), row.names = c(NA, 6L), class = "data.frame")

df和tt的RS列是一样的

如何使用这个 sunz() 函数来创建一个新的数据框,它将 看起来和 tt 一样,只是它会有额外的列,比如命名 "META" 计算了每一行的 meta p 值

这是第一行中 p 值是多少的示例:

 > sumz(c(0.9986,0.7153,0.604716), weights = c(40.6325,35.39774,580.6436), na.action = na.fail)
p =  0.6940048

这是我所指的功能: https://www.rdocumentation.org/packages/metap/versions/1.1/topics/sumz

我尝试合并这两个数据框并在每一行上应用一个函数:

> head(q)
       ID         P         G       E       wb      wg       we
1:  rs1029830 0.0979931 0.0054060 0.39160 580.6436 40.6325 35.39774
2:  rs1029832 0.1501820 0.0028140 0.39320 580.6436 40.6325 35.39774
3: rs11078374 0.1701250 0.0009805 0.49730 580.6436 40.6325 35.39774
4:  rs1124961 0.1710150 0.7252000 0.05737 580.6436 40.6325 35.39774
5:  rs1135237 0.1493650 0.6851000 0.06354 580.6436 40.6325 35.39774
6: rs11867934 0.0757972 0.0006140 0.00327 580.6436 40.6325 35.39774


helper <- function(x) {
   p <- sumz(x[2:4], weights = x[5:7])$p
   p
}

q$META <- apply(q, MARGIN = 1, helper)

但我收到此错误:

 Error in sumz(x[2:4], weights = x[5:7]) : 
  Must have at least two valid p values 

【问题讨论】:

  • 请通过dput(head(tt))等提供您的示例数据,而不仅仅是打印输出
  • 我刚做了。非常感谢您的建议!

标签: r


【解决方案1】:

首先,既然你说RS 在两者之间是相同的,那对我来说听起来很谨慎“我们有多确定行总是正确排列?”防御性的,我会说“不是 100%”,然后将它们加入/合并在一起,以保证它们以正确的顺序排列。

quux <- tt[df, on="RS"]
quux
#           RS      G      E        B      wg       we       wb
# 1: rs2089177 0.9986 0.7153 0.604716 40.6325 35.39774 580.6436
# 2: rs4360974 0.9738 0.7838 0.430228 40.6325 35.39774 580.6436
# 3: rs6502526 0.9744 0.7839 0.429160 40.6325 35.39774 580.6436
# 4: rs8069906 0.7184 0.4918 0.521452 40.6325 35.39774 580.6436
# 5: rs9905280 0.7205 0.4861 0.465758 40.6325 35.39774 580.6436
# 6: rs4313843 0.9804 0.8522 0.474313 40.6325 35.39774 580.6436

从这里开始,它只是将行的每个部分与同一行的其他部分应用于每一行:

quux$META <- sapply(seq_len(nrow(quux)), function(rn) {
  unlist(sumz(as.matrix(quux[,.(G,E,B)])[rn,], weights = as.vector(quux[,.(wg,we,wb)])[rn,],
              na.action=na.fail)["p"])
})
quux
#           RS      G      E        B      wg       we       wb      META
# 1: rs2089177 0.9986 0.7153 0.604716 40.6325 35.39774 580.6436 0.9863582
# 2: rs4360974 0.9738 0.7838 0.430228 40.6325 35.39774 580.6436 0.9294546
# 3: rs6502526 0.9744 0.7839 0.429160 40.6325 35.39774 580.6436 0.9300445
# 4: rs8069906 0.7184 0.4918 0.521452 40.6325 35.39774 580.6436 0.6379392
# 5: rs9905280 0.7205 0.4861 0.465758 40.6325 35.39774 580.6436 0.6055061
# 6: rs4313843 0.9804 0.8522 0.474313 40.6325 35.39774 580.6436 0.9605584

或者更以data.table为中心的方式:

mysumz <- function(x, w) sumz(unlist(x), weights = unlist(w), na.action = na.fail)[["p"]]
quux[, META := mysumz(.(G,E,B), .(wg,we,wb)), by = seq_len(nrow(quux))]

(借用https://stackoverflow.com/a/36802640)。第二个函数是必需的,因为对mysumz 的每次调用对于xw 都有一个list,但sumz 需要向量。如果您想验证这一点,请先调用debugonce(mysumz),然后运行quux[,META:=...] 并查看xw ... 以及它是如何工作的。

【讨论】:

  • 嗨,请检查我的例子,这个解决方案没有给出正确的解决方案 sumz(c(0.9986,0.7153,0.604716), weights = c(40.6325,35.39774,580.6436), na.action = na.fail )
猜你喜欢
  • 2018-10-29
  • 1970-01-01
  • 1970-01-01
  • 2016-02-04
  • 2011-06-15
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-07-29
相关资源
最近更新 更多