【发布时间】:2017-07-07 18:45:34
【问题描述】:
我只想对包含子字符串的行进行子集化,然后删除子字符串。我可以做第一部分,但我不知道如何删除子字符串
这是一个例子
library(Biostrings)
myseq <-DNAStringSet(c("CCCATGAAAGATCGGAAGAGCACACGTCTGAACCCATGAA", "CCCATGAACATAGATCC", "CCCGTACAGATCACGTG"))
names(myseq) <- letters[1:3]
myseq
A DNAStringSet instance of length 3
width seq names
[1] 40 CCCATGAAAGATCGGAAGAGCACACGTCTGAACCCATGAA a
[2] 17 CCCATGAACATAGATCC b
[3] 17 CCCGTACAGATCACGTG c
我要删除的序列是 AGATCGGAAGAGCACACGTCTGAA,它位于第一行。
matchPattern("AGATCGGAAGAGCACACGTCTGAA", myseq[[1]])
Views on a 40-letter DNAString subject
subject: CCCATGAAAGATCGGAAGAGCACACGTCTGAACCCATGAA
views:
start end width
[1] 9 32 24 [AGATCGGAAGAGCACACGTCTGAA]
对子集我执行以下操作:
pat <- vmatchPattern("AGATCGGAAGAGCACACGTCTGAA", myseq)
myseq[ lapply(lapply(pat, isEmpty), function(x) x == FALSE) ]
A DNAStringSet instance of length 3
width seq names
[1] 40 CCCATGAAAGATCGGAAGAGCACACGTCTGAACCCATGAA a
[2] 0 b
[3] 0 c
输出应该是
A DNAStringSet instance of length 3
width seq names
[1] 11 CCCCCCATGAA a
[2] 0 b
[3] 0 c
【问题讨论】:
标签: r regex bioinformatics fasta bioconductor