【问题标题】:Area under curve for several sub groups几个子组的曲线下面积
【发布时间】:2021-01-21 09:12:57
【问题描述】:

我希望根据值 x 绘制 day 并计算我的数据集中 3 组(a、b、c)的曲线下面积。

我试过这个:

df %>%
  arrange(soil, daysincubated4) %>%
  group_by(soil) %>%
  summarise(areaundercurve = sum(diff(day)*rollmean(totalbvocs,2)))

这是我的数据集:

df <- structure(list(daysincubated4 = c(24, 24, 24, 24, 24, 24, 24, 
24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 
24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 
24, 24, 24, 24, 24, 24, 24, 24, 24, 66, 66, 66, 66, 66, 66, 66, 
66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 
66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 
66, 66, 66, 66, 66, 66, 66, 66, 66, 81, 81, 81, 81, 81, 81, 81, 
81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 
81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 81, 
81, 81, 81, 81, 81, 81, 81, 81, 94, 94, 94, 94, 94, 94, 94, 94, 
94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 
94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 94, 
94, 94, 94, 94, 81), totalbvocs = c(0.16, 9.29, 0.03, 2.63, 0.14, 
6.05, 340.03, 0.03, 3.89, 6.67, 1.89, 5.95, 1.89, 1.42, 0.35, 
0.2, 0.01, 0.48, 0.4, 3.9, 0.15, 0.02, 0.37, 1.95, 0.04, 3.74, 
0.25, 3.27, 0.18, 0.01, 2.44, 0.64, 0.63, 0.23, 0.03, 0.02, 26.92, 
0.02, 0.62, 0.74, 0.17, 1.63, 5.98, 0.23, 1.37, 13.9, 0.37, 0.08, 
0.73, 0.02, 0.13, 0.02, 2.63, 0.05, 2.07, 0.29, 0.01, 0.06, 1.03, 
1.16, 0.04, 0.07, 0.04, 0.02, 0.01, 0.04, 0.01, 0.01, 0.05, 0.01, 
0.03, 0.01, 0.01, 0.02, 0.02, 0.01, 0.07, 0, 0.72, 0.14, 0, 0.02, 
0, 0, 0.75, 0.06, 0.03, 0.11, 0.01, 0.16, 0.06, 0.04, 0.05, 1.68, 
0.1, 0.06, 0.2, 0, 4.69, 0, 0.15, 0, 0.6, 0.01, 0, 0.05, 0.33, 
2.06, 0.04, 0.01, 0, 0.84, 0, 0.01, 0.01, 0, 0.01, 0.01, 0.01, 
0, 0.01, 0, 0, 0.15, 0.01, 0, 0.46, 0, 0, 0, 0, 0.89, 0.01, 0, 
0.07, 0, 0.03, 0.39, 0.04, 0.04, 87.18, 0.09, 0.06, 0.21, 0.03, 
0.07, 0, 0.04, 0.01, 0.06, 0.24, 0.11, 0.01, 0.15, 0, 0.03, 0.02, 
0.01, 0.01, 0, 0.08, 0.25, 0.01, 0.03, 0.01, 0, 0, 0, 0.12, 7.09, 
0.04, 0.01, 0.03, 0, 0.01, 0, 0, 0.29, 0, 0.07, 0.05, 0.35, 0.02, 
0.02, 1.76, 0.08, 0.18, 0.01), soil = c("6", "12", "18", "2", 
"39", "1", "14", "4", "9", "16", "10", "28", "33", "8", "31", 
"92", "25", "23", "20", "83", "66", "19", "27", "22", "95", "26", 
"21", "69", "30", "113", "15", "100", "38", "24", "110", "102", 
"34", "37", "7", "36", "17", "13", "29", "32", "90", "5", "3", 
"35", "31", "6", "12", "18", "2", "39", "1", "14", "4", "9", 
"16", "10", "28", "33", "8", "92", "25", "23", "20", "83", "66", 
"19", "27", "22", "95", "26", "21", "69", "30", "113", "15", 
"100", "38", "24", "110", "102", "34", "37", "7", "36", "17", 
"13", "29", "32", "90", "5", "3", "35", "31", "6", "12", "18", 
"2", "39", "1", "14", "4", "9", "16", "10", "28", "33", "8", 
"92", "25", "23", "20", "83", "66", "19", "27", "22", "95", "26", 
"21", "69", "30", "113", "15", "100", "38", "110", "102", "34", 
"37", "7", "36", "17", "13", "29", "32", "90", "5", "3", "35", 
"31", "6", "12", "18", "2", "39", "4", "9", "16", "10", "28", 
"33", "8", "92", "25", "23", "20", "83", "66", "19", "27", "22", 
"95", "26", "21", "69", "30", "113", "15", "100", "38", "24", 
"110", "102", "34", "37", "7", "36", "17", "13", "29", "5", "3", 
"35", "24")), row.names = c(NA, -188L), class = "data.frame")

非常感谢所有帮助!

【问题讨论】:

  • 如果您的问题只是计算每个子组,您可以使用dplyr::group_by(group),然后使用mutate() 分别获取每个组的结果
  • 谢谢,你的意思是这样的:df &lt;- df%&gt;% group_by(group)%&gt;% mutate(sum(diff(df$day[id])*rollmean(df$x[id],2))我至少做不到
  • 您不需要在mutate 操作中使用df$ - 所有这一切都已经发生在df 上,它作为第一个参数传递(通过%&gt;%mutate。在下面查看@Ronak Shas 的答案。

标签: r


【解决方案1】:

您可以为每个group 执行计算:

library(dplyr)
library(zoo)

df %>%
  arrange(group, day) %>%
  group_by(group) %>%
  summarise(areaundercurve = sum(diff(day)*rollmean(x,2)))

#   group areaundercurve
#  <chr>          <dbl>
#1 a               1658
#2 b               1023
#3 c               1297

【讨论】:

  • 非常感谢@Ronak Shah。我不能让它与我的实际数据一起工作——所有的 AUC 都是零。我想知道(x,2) 中的 2 是什么?如果数据看起来不同,那应该是别的吗?
  • 2 是滚动窗口的大小。 rollmean(1:10, 2) 给出 1 和 2 的平均值,然后是 2 和 3,然后是 3 和 4,依此类推。如果您使用rollmean(1:10, 3),它将返回 1、2 和 3 的平均值,然后是 2、3 和 4,依此类推。
【解决方案2】:

有很多更好的方法来编写代码,但我会使用 DescTools 包中的 AUC 命令。

这里通过子组循环解决: 请注意,我假设您希望在函数的 x 轴上拥有的变量(例如 f(x))实际上是 day 变量(时间通常在 x 轴上)。否则,就换这两个吧!

library(DescTools)
for ( i in unique(df$group)){
  AUC(x = df[df$group == i,"day"],
      y = df[df$group == i,"x"]) %>% 
    print()
}


[1] 1658
[1] 1023
[1] 1297

【讨论】:

    猜你喜欢
    • 2023-02-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-11-24
    • 2011-06-24
    • 1970-01-01
    相关资源
    最近更新 更多