【发布时间】:2016-01-05 19:08:58
【问题描述】:
我有一个这样的数据框:
df = read.table(text="REF Alt S00001 S00002 S00003 S00004 S00005
TAAGAAG TAAG TAAGAAG/TAAGAAG TAAGAAG/TAAG TAAG/TAAG TAAGAAG/TAAGAAG TAAGAAG/TAAGAAG
T TG T/T -/- TG/TG T/T T/T
CAAAA CAAA CAAAA/CAAAA CAAAA/CAAA CAAAA/CAAAA -/- CAAAA/CAAAA
TTGT TTGTGT TTGT/TTGT TTGT/TTGT TTGT/TTGT TTGTGT/TTGTGT TTGT/TTGTGT
GTTT GTTTTT GTTT/GTTTTT GTTT/GTTT GTTT/GTTT GTTT/GTTT GTTTTT/GTTTTT", header=T, stringsAsFactors=F)
我想将由“/”分隔的字符元素替换为“D”或“I”,具体取决于“REF”和“Alt”列中字符串的长度。如果元素匹配最长的元素,它们将被“I”替换,否则被“D”替换。但是“-”没有变化。所以预期结果为:
REF Alt S00001 S00002 S00003 S00004 S00005
TAAGAAG TAAG I/I I/D D/D I/I I/I
T TG D/D -/- I/I D/D D/D
CAAAA CAAA I/I I/D I/I -/- I/I
TTGT TTGTGT D/D D/D D/D I/I D/I
GTTT GTTTTT D/I D/D D/D D/D I/I
【问题讨论】:
标签: regex r dna-sequence