【发布时间】:2015-09-24 21:41:16
【问题描述】:
可能重复Here
我有一个包含两列的数据框。我想删除括号中的字符串并将其添加为新列。数据框如下所示。
structure(list(ID = 1:12, Gene.Name = structure(c(3L, 11L, 9L,
5L, 1L, 8L, 2L, 4L, 6L, 12L, 10L, 7L), .Label = c(" ATP synt, H+ tran, O subunit (oligomycin sensitivity conferring protein) (ATP5O), mRNA",
" heterogeneous nuclear ribonucleoprotein F (HNRPF), mRNA", " NADH (ubiquinone) 1 alpha subcomplex, 4 (9kD, MLRQ) (NDUFA4), mRNA",
" ribosomal protein L34 (RPL34), transcript variant 1, mRNA",
" ribosomal protein S11 (RPS11), mRNA", "ATP synthase, H+ tran, mitochondrial F0, subunit c (subunit 9) isoform 3 (ATP5G3), mRNA",
"clone MGC:10120 IMAGE:3900723, mRNA, complete cds", "cytidine monophosphate N-acetylneuraminic acid synthetase (CMAS), mRNA",
"farnesyl-diphosphate farnesyltransferase 1 (FDFT1), mRNA", "homeobox protein from AL590526 (LOC84528), mRNA",
"mitochondrial S33 (MRPS33), transcript variant 1, nuclear gene, mRNA",
"ribosomal protein S15a (RPS15A), mRNA"), class = "factor")), .Names = c("ID",
"Gene.Name"), row.names = c(NA, -12L), class = "data.frame")
如果未找到括号中的字符串,则将该行留空。这里我有两个案例
1) 获取括号中的所有字符串并添加为以,分隔的新列
2) 括号中的最后一个字符串并添加为新列
我尝试了df$Symbol <- sapply(df, function(x) sub("\\).*", "", sub(".*\\(", "", x))) 之类的方法,但没有给出适当的输出
案例 1 输出
ID Gene.Name Symbol
1 NADH (ubiquinone) 1 alpha subcomplex, 4 (9kD, MLRQ) (NDUFA4), mRNA ubiquinone, (9kD, MLRQ),NDUFA4
2 mitochondrial S33 (MRPS33), transcript variant 1, nuclear gene, mRNA MRPS33
3 farnesyl-diphosphate farnesyltransferase 1 (FDFT1), mRNA FDFT1
4 ribosomal protein S11 (RPS11), mRNA RPS11
5 ATP synt, H+ tran, O subunit (oligomycin sensitivity conferring protein) (ATP5O), mRNA oligomycin sensitivity conferring protein,ATP5O
6 cytidine monophosphate N-acetylneuraminic acid synthetase (CMAS), mRNA CMAS
7 heterogeneous nuclear ribonucleoprotein F (HNRPF), mRNA HNRPF
8 ribosomal protein L34 (RPL34), transcript variant 1, mRNA RPL34
9 ATP synthase, H+ tran, mitochondrial F0, subunit c (subunit 9) isoform 3 (ATP5G3), mRNA subunit 9,ATP5G3
10 ribosomal protein S15a (RPS15A), mRNA RPS15A
11 homeobox protein from AL590526 (LOC84528), mRNA LOC84528
12 clone MGC:10120 IMAGE:3900723, mRNA, complete cds NA
案例 2 输出
ID Gene.Name Symbol
1 NADH (ubiquinone) 1 alpha subcomplex, 4 (9kD, MLRQ) (NDUFA4), mRNA NDUFA4
2 mitochondrial S33 (MRPS33), transcript variant 1, nuclear gene, mRNA MRPS33
3 farnesyl-diphosphate farnesyltransferase 1 (FDFT1), mRNA FDFT1
4 ribosomal protein S11 (RPS11), mRNA RPS11
5 ATP synt, H+ tran, O subunit (oligomycin sensitivity conferring protein) (ATP5O), mRNA ATP5O
6 cytidine monophosphate N-acetylneuraminic acid synthetase (CMAS), mRNA CMAS
7 heterogeneous nuclear ribonucleoprotein F (HNRPF), mRNA HNRPF
8 ribosomal protein L34 (RPL34), transcript variant 1, mRNA RPL34
9 ATP synthase, H+ tran, mitochondrial F0, subunit c (subunit 9) isoform 3 (ATP5G3), mRNA ATP5G3
10 ribosomal protein S15a (RPS15A), mRNA RPS15A
11 homeobox protein from AL590526 (LOC84528), mRNA LOC84528
12 clone MGC:10120 IMAGE:3900723, mRNA, complete cds <NA>
【问题讨论】:
-
@akrun 可能是,但我在这里提到了两个案例,只是想显示案例 2 的输出。