【发布时间】:2019-05-01 19:44:56
【问题描述】:
我有这样的数据。
structure(list(structureId = c("1JDN", "1DP4", "1XS5", "1SW1",
"1P99", "1IXH"), structureTitle = c("Crystal Structure of Hormone Receptor",
"DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRIURETIC PEPTIDE RECEPTOR",
"The Crystal Structure of Lipoprotein Tp32 from Treponema pallidum",
"Crystal structure of ProX from Archeoglobus fulgidus in complex with proline betaine",
"1.7A crystal structure of protein PG110 from Staphylococcus aureus",
"PHOSPHATE-BINDING PROTEIN (PBP) COMPLEXED WITH PHOSPHATE"),
chainId = c("A", "A", "A", "A", "A", "A"), ligandId = c("BMA,CL,FUC,MAN,NAG,NDG",
"CL,NAG,SO4", "MET", "MSE,PBE,ZN", "GLY,MET", "PO4"), ligandName = c("BETA-D-MANNOSE,CHLORIDE ION,ALPHA-L-FUCOSE,ALPHA-D-MANNOSE,N-ACETYL-D-GLUCOSAMINE,2-(ACETYLAMINO)-2-DEOXY-A-D-GLUCOPYRANOSE",
"CHLORIDE ION,N-ACETYL-D-GLUCOSAMINE,SULFATE ION", "METHIONINE",
"SELENOMETHIONINE,1,1-DIMETHYL-PROLINIUM,ZINC ION", "GLYCINE,METHIONINE",
"PHOSPHATE ION")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
我想将ligandId 和ligandName 的值分成不同的行。我的意思是,每行只有 1 个 ligandId 和 ligandName。
我尝试过使用separate_rows,但它不能很好地处理我的两列。
df %>% separate_rows(ligandId, ligandName, sep = ",")
但我收到此错误:
> df %>% separate_rows(ligandId, ligandName, sep = ",")
Error: All nested columns must have the same number of elements.
Call `rlang::last_error()` to see a backtrace
> rlang::last_error()
<error>
message: All nested columns must have the same number of elements.
class: `rlang_error`
backtrace:
1. tidyr::separate_rows(., ligandId, ligandName, sep = ",")
10. tidyr:::unnest.data.frame(data, !!!syms(vars), .drop = FALSE)
12. tidyr::separate_rows(., ligandId, ligandName, sep = ",")
Call `rlang::last_trace()` to see the full backtrace
另外,我试过这个:Split comma-separated strings in a column into separate rows,但没有成功。
最后我想要这样的东西:
1JDN A BMA BETA-D-MANNOSE
1JDN A CL CHLORIDE ION
1JDN A FUC ALPHA-L-FUCOSE
1JDN A MAN ALPHA-D-MANNOSE
1JDN A NAG N-ACETYL-D-GLUCOSAMINE
1JDN A NDG 2-(ACETYLAMINO)-2-DEOXY-A-D-GLUCOPYRANOSE
...
【问题讨论】:
-
请检查
separate和gatherfromtidyverse -
@Sonny 我试过
separated_rows,但没有处理多列。 @camille 抱歉,我看到了这个,但我还是有错误。 -
如果您在该帖子中尝试了所有 4 个答案,但没有一个有效,您应该使用信息更新您的问题,以说明为什么需要另一种方法
-
您的数据的
str是什么?那里似乎有列表。 -
在第 4 行,
ligandId中有 2 个逗号,ligandName中有 3 个逗号。可能是因为化学符号——在这种情况下,您需要找到一些不同的分隔符来分割行。
标签: r split dplyr row tidyverse