【发布时间】:2021-09-17 19:22:51
【问题描述】:
我是新来的,我绝对需要您的帮助:我想计算数据框中 4 个类别的因子出现次数(1 列,2 个级别)和行数(或观察值)并显示在另一个汇总数据框中输出。
我解释说:我做了一个实验,我测量了放在假贻贝床上的动物(帽贝)的壳温度。我想知道在每个时间点、侵染程度和复制情况下,我的动物有多少暴露或庇护(位置)。
这是我的第一个数据框(动物群)的样子:
> dput(infauna)
structure(list(date = c("14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021",
"14/04/2021", "14/04/2021", "14/04/2021", "14/04/2021"), day.type = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "sunny", class = "factor"),
time = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("0",
"30", "60", "90"), class = "factor"), real.time = c("10:55",
"10:55", "10:55", "10:55", "10:55", "10:55", "10:55", "10:55",
"10:55", "10:55", "10:55", "10:55", "10:55", "10:55", "10:55",
"10:55", "10:55", "10:55", "10:55", "10:55", "10:55", "10:55",
"10:55", "10:55", "10:55", "10:55", "10:55", "10:55", "10:55",
"10:55", "10:55", "10:55", "10:55", "10:55", "10:55", "10:55",
"10:55", "10:55", "10:55", "10:55", "11:25", "11:25", "11:25",
"11:25", "11:25", "11:25", "11:25", "11:25", "11:25", "11:25",
"11:25", "11:25", "11:25", "11:25", "11:25", "11:25", "11:25",
"11:25", "11:25", "11:25", "11:25", "11:25", "11:25", "11:25",
"11:25", "11:25", "11:25", "11:25", "11:55", "11:55", "11:55",
"11:55", "11:55", "11:55", "11:55", "11:55", "11:55", "11:55",
"11:55", "11:55", "11:55", "11:55", "11:55", "11:55", "11:55",
"11:55", "11:55", "11:55", "11:55", "12:25", "12:25", "12:25",
"12:25", "12:25", "12:25", "12:25", "12:25", "12:25", "12:25",
"12:25", "12:25", "12:25", "12:25", "12:25", "12:25", "12:25",
"12:25", "12:25", "12:25", "12:25", "12:25", "12:25"), infauna = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = "Scutellastra granularis", class = "factor"),
infestation = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("clean",
"infested"), class = "factor"), replicate = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor"),
specimen = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L,
5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L,
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
6L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L), shell.temp = c(23.5,
24.1, 24, 23.8, 23.9, 23.1, 22.7, 23.3, 24.3, 24.4, 23.6,
24.2, 23.2, 25, 26.5, 25, 25.4, 24.3, 25.9, 22.6, 25, 26.4,
25, 24.8, 25.7, 24.1, 24.6, 24.9, 24, 25, 23.4, 24.2, 24.5,
25.3, 26.2, 26.6, 25.8, 25.7, 25.8, 25.3, 32.3, 30.2, 31.4,
32.4, 29.8, 33.8, 36, 36.4, 35.2, 37.5, 33.9, 30.4, 29.7,
35.8, 32.9, 28.2, 27.7, 35.3, 36.1, 35.8, 34.8, 33.2, 32.7,
28.1, 31.4, 31, 37, 30.6, 36.3, 37.1, 35.7, 34.3, 38.6, 36.4,
38, 33, 29.7, 32.2, 36.2, 38.3, 39, 37.1, 33.7, 35.3, 36.5,
30.1, 38.2, 36.2, 33.7, 36.2, 37.6, 38.6, 39.4, 34.2, 42.4,
39, 40.2, 34.6, 33, 32.7, 32.5, 42, 40.8, 38.1, 35.1, 38.8,
31.5, 37.3, 37.4, 36.1, 37.5, 40.1), position = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 1L), .Label = c("exposed", "sheltered"
), class = "factor"), no.software = c("M1", "M2", "M3", "M4",
"M5", "M6", "M7", "M1", "M2", "M3", "M4", "M5", "M6", "M1",
"M2", "M3", "M4", "M5", "M6", "M7", "M1", "M2", "M3", "M4",
"M5", "M6", "M7", "M1", "M2", "M3", "M4", "M5", "M6", "M1",
"M2", "M3", "M4", "M5", "M6", "M7", "M1", "M2", "M3", "M4",
"M5", "M6", "M1", "M2", "M3", "M4", "M5", "M1", "M2", "M3",
"M4", "M5", "M6", "M1", "M2", "M3", "M4", "M1", "M2", "M3",
"M4", "M1", "M2", "M3", "M1", "M2", "M3", "M4", "M1", "M2",
"M3", "M1", "M2", "M3", "M4", "M1", "M2", "M3", "M1", "M2",
"M3", "M4", "M1", "M2", "M3", "M1", "M2", "M3", "M4", "M5",
"M1", "M2", "M3", "M1", "M2", "M3", "M4", "M1", "M2", "M3",
"M1", "M2", "M3", "M4", "M5", "M1", "M2", "M3")), row.names = c(NA,
-112L), class = "data.frame")
这是一项正在进行的科学研究,因此请不要过于广泛地分享此数据集 :)
我开始创建一个数据框来总结平均壳温、标准偏差和每个时间点的动物数量、侵染程度、复制和位置,使用以下代码(感谢 cmets):
library(dplyr)
infauna.mean <- infauna %>%
group_by(time, infestation, replicate, position, .drop = FALSE) %>%
summarise(shell.mean = mean(shell.temp, na.rm=TRUE),
shell.sd = sd(shell.temp, na.rm=TRUE))
这在新数据框 (infauna.mean) 中为我提供了以下输出:
> dput(infauna.mean)
structure(list(time = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 60L,
60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 90L, 90L, 90L,
90L, 90L, 90L, 90L, 90L, 90L, 90L), infestation = c("clean",
"clean", "clean", "clean", "clean", "infested", "infested", "infested",
"infested", "infested", "infested", "clean", "clean", "clean",
"clean", "clean", "infested", "infested", "infested", "infested",
"infested", "infested", "clean", "clean", "clean", "clean", "clean",
"infested", "infested", "infested", "infested", "infested", "infested",
"clean", "clean", "clean", "clean", "infested", "infested", "infested",
"infested", "infested", "infested"), replicate = c(1L, 2L, 2L,
3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 3L, 3L, 1L, 1L, 2L,
2L, 3L, 3L, 1L, 1L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 2L,
3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), position = c("exposed", "exposed",
"sheltered", "exposed", "sheltered", "exposed", "sheltered",
"exposed", "sheltered", "exposed", "sheltered", "exposed", "sheltered",
"exposed", "exposed", "sheltered", "exposed", "sheltered", "exposed",
"sheltered", "exposed", "sheltered", "exposed", "sheltered",
"exposed", "exposed", "sheltered", "exposed", "sheltered", "exposed",
"sheltered", "exposed", "sheltered", "exposed", "exposed", "exposed",
"sheltered", "exposed", "sheltered", "exposed", "sheltered",
"exposed", "sheltered"), shell.mean = c(23.8333333333333, 25.38,
24.35, 25.9, 25.3, 23.7333333333333, 22.7, 25.35, 22.6, 24.36,
24.2, 36.275, 33.9, 35.5, 37, 30.8, 31.85, 31.55, 34.35, 29,
32.4333333333333, 28.1, 38.3, 36.4, 38.1333333333333, 37.2, 33.7,
36.7, 35, 34.6, 30.95, 35.9, 31.9, 40.5333333333333, 40.3, 38.8,
36.1, 37.95, 34.2, 34.6, 32.7333333333333, 37.8333333333333,
33.3), shell.sd = c(0.53166405433005, 0.664830805543786, 0.353553390593274,
0.447213595499958, NA, 0.372379734500505, NA, 0.771362431027075,
NA, 0.66558245169175, NA, 0.956991814663705, NA, 0.57154760664941,
NA, 0.282842712474618, 0.636396103067892, 1.88591268797542, 2.05060966544099,
1.26227308191743, 0.929157324317759, NA, 0.42426406871193, NA,
0.960902353693304, 1.4142135623731, NA, 0.565685424949241, 0.989949493661171,
2.26274169979695, 1.76776695296637, 0.848528137423859, 2.54558441227157,
1.72433562085034, 1.99749843554382, 1.83847763108502, NA, 1.37961347243832,
NA, NA, 0.251661147842358, 0.838649708360608, 2.54558441227157
)), row.names = c(NA, -43L), groups = structure(list(time = c(0L,
0L, 0L, 0L, 0L, 0L, 30L, 30L, 30L, 30L, 30L, 30L, 60L, 60L, 60L,
60L, 60L, 60L, 90L, 90L, 90L, 90L, 90L, 90L), infestation = c("clean",
"clean", "clean", "infested", "infested", "infested", "clean",
"clean", "clean", "infested", "infested", "infested", "clean",
"clean", "clean", "infested", "infested", "infested", "clean",
"clean", "clean", "infested", "infested", "infested"), replicate = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L), .rows = structure(list(1L, 2:3,
4:5, 6:7, 8:9, 10:11, 12:13, 14L, 15:16, 17:18, 19:20, 21:22,
23:24, 25L, 26:27, 28:29, 30:31, 32:33, 34L, 35L, 36:37,
38:39, 40:41, 42:43), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -24L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
我想在此数据框中再添加两列,其中包含以下信息:
- nb.position :每组暴露或庇护的动物数量 - 转换为给定组的“位置”列中“暴露”或“庇护”的出现次数(例如 t=0,infestation=clean,replicate=1,position=exposed,nb.position = 6)
- nb.visible :每组可见的动物总数 - 转换为给定组的行数(观察)(例如 t=0,infestation=clean,复制=1,位置=暴露/遮蔽,nb.visible = 6)
我尝试了几个在这里和那里清除的代码,但没有成功。这是我在 Excel 上手动完成的预期输出的虚拟版本:
> dput(infauna.mean)
structure(list(time = c(0L, 30L, 60L, 90L, 0L, 30L, 60L, 90L,
0L, 30L, 60L, 90L, 0L, 30L, 60L, 90L, 0L, 30L, 60L, 90L, 0L,
30L, 60L, 90L), infestation = c("clean", "clean", "clean", "clean",
"infested", "infested", "infested", "infested", "clean", "clean",
"clean", "clean", "infested", "infested", "infested", "infested",
"clean", "clean", "clean", "clean", "infested", "infested", "infested",
"infested"), replicate = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
position = c("exposed", "exposed", "exposed", "exposed",
"exposed", "exposed", "exposed", "exposed", "exposed", "exposed",
"exposed", "exposed", "exposed", "exposed", "exposed", "exposed",
"exposed", "exposed", "exposed", "exposed", "exposed", "exposed",
"exposed", "exposed"), shell.mean = c(23.8333, 36.275, 38.3,
40.5333, 23.7333, 31.85, 36.7, 37.95, 25.38, 35.5, 38.1333,
40.3, 25.35, 34.35, 34.6, 34.6, 25.9, 37, 37.2, 38.8, 24.36,
32.4333, 35.9, 37.8333), shell.sd = c("0,5317", "0,9570",
"0,4243", "1,7243", "0,3724", "0,6364", "0,5657", "1,3796",
"0,6648", "0,5715", "0,9609", "1,9975", "0,7714", "2,0506",
"2,2627", "na", "0,4472", "na", "1,4142", "1,8385", "0,6656",
"0,9292", "0,8485", "0,8386"), nb.position = c(6L, 4L, 2L,
3L, 6L, 2L, 2L, 4L, 5L, 4L, 3L, 3L, 6L, 2L, 2L, 1L, 6L, 1L,
2L, 2L, 5L, 3L, 2L, 3L), x.position = c(0.75, 0.5, 0.25,
0.38, 0.75, 0.25, 0.25, 0.5, 0.63, 0.5, 0.38, 0.38, 0.75,
0.25, 0.25, 0.13, 0.75, 0.13, 0.25, 0.25, 0.63, 0.38, 0.25,
0.38), nb.visible = c(6L, 5L, 3L, 3L, 7L, 6L, 4L, 5L, 7L,
4L, 3L, 3L, 7L, 6L, 4L, 4L, 7L, 3L, 3L, 3L, 6L, 4L, 4L, 5L
), x.visible = c(0.75, 0.63, 0.38, 0.38, 0.88, 0.75, 0.5,
0.63, 0.88, 0.5, 0.38, 0.38, 0.88, 0.75, 0.5, 0.5, 0.88,
0.38, 0.38, 0.38, 0.75, 0.5, 0.5, 0.63)), class = "data.frame", row.names = c(NA,
-24L))
“x.position”和“x.visible”列只是我为图形目的所做的百分比。
请注意,我实际上是一名博士生,很遗憾我没有对 R 的支持。我的主管和同事使用 Excel 和 Statistica,但我还有 8 个像这样的数据集,我不想这样做所有的计数都是手工计算的(我不想成为穴居人 TT)。
如果您需要更多详细信息,请不要犹豫,非常感谢您抽出宝贵时间! 社区的任何帮助都将在我的论文和我未来发表的论文中得到承认;)
【问题讨论】:
-
请通过写
dput(infauna)分享您的数据并将结果包含在代码块中。此外,如果您更简洁地说明您想要的输出,它将帮助您获得好的答案。 -
@AndyEggers 感谢您的建议!