【发布时间】:2021-10-04 19:50:41
【问题描述】:
我有一张类似这样的表:
| Sequence | Modification | Modified.Sequence |
|---|---|---|
| ABCDEF | Acetyl (Protein N-term),Oxidation (M),Methyl (KR) | AB(Acetyl (Protein N-term))CD(Oxidation (M))EF(Methyl (KR)) |
| ABCDEFGH | Oxidation (M) | ABCDEF(Oxidation (M))GH |
| DEFGH | Acetyl (Protein N-term), Methyl (KR) | ABC(Acetyl (Protein N-term))DEF(Methyl (KR))GH |
我需要每行只有一个修改。为此,我必须重复序列 N 次,即 N 是该序列的修改次数。并从修改后的序列中减去修改。
这是预期的:
| Sequence | Modification | Modified.Sequence |
|---|---|---|
| ABCDEF | Acetyl (Protein N-term) | AB(Acetyl (Protein N-term))CDEF |
| ABCDEF | Oxidation (M) | ABCD(Oxidation (M))EF |
| ABCDEF | Methyl (KR) | ABCDEF(Methyl (KR)) |
| ABCDEFGH | Oxidation (M) | ABCDEF(Oxidation (M))GH |
| DEFGH | Acetyl (Protein N-term) | ABC(Acetyl (Protein N-term))DEFGH |
| DEFGH | Methyl (KR) | ABCDEF(Methyl (KR))GH |
df = data.frame(
Sequence = c('ABCDEF','ABCDEFGH','DEFGH'),
Modification = c('Acetyl (Protein N-term),Oxidation (M),Methyl (KR)','Oxidation (M)','Acetyl (Protein N-term), Methyl (KR)'),
Modified.Sequence = c('AB(Acetyl (Protein N-term))CD(Oxidation (M))EF(Methyl (KR))','ABCDEF(Mod3))GH',
'ABC(Acetyl (Protein N-term))DEF(Methyl (KR))GH')
)
修改可以比这个reprex中的更多。
【问题讨论】:
-
我正在尝试,所以不允许我添加编辑...
-
@akrun 有点解决了。如果我不在桌子周围添加 ``` ``` 会出错
-
减法部分不清楚。第一部分你可以用
library(tidyr);library(dplyr);df %>% separate_rows(Modification, Modified.Sequence, sep = ",\\s*|(?<=\\))(?=[A-Z]+\\()") -
谢谢@akrun 现在减法部分清楚了吗?