【发布时间】:2021-09-13 14:26:10
【问题描述】:
目前,我有这么宽的数据框。
> dput(head(data))
structure(list(Host.H = c("Human", "Human", "Human", "Human",
"Human", "Human"), Seq_ID.H = c(">H3-USA", ">H3-USA", ">H3-USA",
">H3-USA", ">H3-USA", ">H3-USA"), Start = c(1L, 121L, 161L, 401L,
721L, 1081L), End = c(160L, 240L, 280L, 520L, 1040L, 1280L),
Strand.H = c("Forward", "Forward", "Forward", "Forward",
"Forward", "Forward"), P.H = c(0.995962, 0.985782, 0.997249,
0.983122, 0.998574, 0.993674), Locus_ID.H = c("id \"locus1\"",
"id \"locus5\"", "id \"locus7\"", "id \"locus8\"", "id \"locus10\"",
"id \"locus12\""), Host.I = c(NA, "Intermediate", NA, NA,
NA, NA), Seq_ID.I = c(NA, ">I3-MM-CHA", NA, NA, NA, NA),
Strand.I = c(NA, "Forward", NA, NA, NA, NA), P.I = c(NA,
0.988441, NA, NA, NA, NA), Locus_ID.I = c(NA, "id \"locus5\"",
NA, NA, NA, NA), Host.B = c(NA, "Bat", "Bat", "Bat", "Bat",
NA), Seq_ID.B = c(NA, ">B2-RS-CHA", ">B2-RS-CHA", ">B2-RS-CHA",
">B2-RS-CHA", NA), Strand.B = c(NA, "Forward", "Forward",
"Forward", "Forward", NA), P.B = c(NA, 0.987457, 0.997273,
0.975433, 0.998187, NA), Locus_ID.B = c(NA, "id \"locus7\"",
"id \"locus9\"", "id \"locus10\"", "id \"locus11\"", NA),
Host.C = c(NA, "Consensus", "Consensus", "Consensus", "Consensus",
NA), Seq_ID.C = c(NA, ">I3-MM-CHA", ">I3-MM-CHA", ">I3-MM-CHA",
">I3-MM-CHA", NA), Strand.C = c(NA, "Forward", "Forward",
"Forward", "Forward", NA), P.C = c(NA, 0.98647, 0.997287,
0.981532, 0.998712, NA), Locus_ID.C = c(NA, "id \"locus7\"",
"id \"locus9\"", "id \"locus10\"", "id \"locus12\"", NA),
Type = c("Unique", "Conserved", "Shared", "Shared", "Shared",
"Unique")), row.names = c(NA, 6L), class = "data.frame")
我一直在寻找一种方法来整理这些数据。为此,我需要将所有带有后缀(.H、.I、.B 和 .C)的列按如下方式分组到每一列中(Host、Seq_ID、Start、End、Strand、P、Locus_ID 和 Type )。 注意,“Type”的值必须按照每一行对应的赋值分配,我用下面期望输出的例子来说明
Host Seq_ID Start End Strand P Locus_ID Type
Human >H3-USA 1 160 Forward 0.99 id "locus1" Unique
Human >H3-USA 121 240 Forward 0.98 id "locus5" Conserved
Intermediate>I3-MM-CHA 121 240 Forward 0.98 id "locus5" Conserved
Bat >B2-RS-CHA 121 240 Forward 0.98 id "locus7" Conserved
Consensus >I3-MM-CHA 121 240 Forward 0.98 id "locus7" Conserved
Human >H3-USA 161 280 Forward 0.99 id "locus7" Shared
so on...
那么我想总结一下“开始”和“结束”位置是否相等,只要“主机”相同即可。
我曾尝试使用“旋转更长”功能,但无法使其正常工作。为此,我希望你能与我合作。我会很感激你
【问题讨论】:
-
您好,欢迎您!在 Stack Overflow 上提问时,请使用
dput()提供可重现的数据。 -
嗨,我同意你的建议,但在这种情况下,使用 dput() 数据看起来不太好,因为数据框太宽太大
-
为什么不只是
dput(head(data))? -
请查看我编辑的帖子
-
表示取第一组的行。只要满足相同“Host”和“Type”中“Start”和“End”位置的重复即可。