【发布时间】:2014-12-30 01:49:41
【问题描述】:
我有两个 fastq 文件,我只需要共享的 fastq 记录。但是,当编写两个仅包含匹配记录的不同文件时,我的脚本失败了。我正在使用 set() 来优化内存使用。有人可以帮我解决问题吗?这是代码:
from Bio.SeqIO.QualityIO import FastqGeneralIterator
infileR1= open('R1.fastq', 'r')
infileR2= open('R2.fastq', 'r')
output1= open('matchedR1.fastq', 'w')
output2= open('matchedR2.fastq', 'w')
all_names1 = set()
for line in infileR1 :
if line[0:11] == '@GWZHISEQ01':
read_name = line.split()[0]
all_names1.add(read_name)
all_names2 = set()
for line in infileR2 :
if line[0:11] == '@GWZHISEQ01':
read_name = line.split()[0]
all_names2.add(read_name)
shared_names = set()
for item in all_names1:
if item in all_names2:
shared_names.add(item)
#printing out the files:
for title, seq, qual in FastqGeneralIterator(infileR1):
if title in new:
output1.write("%s\n%s\n+\n%s\n" % (title, seq, qual))
for title, seq, qual in FastqGeneralIterator(infileR2):
if title in shared_names:
output2.write("%s\n%s\n+\n%s\n" % (title, seq, qual))
infileR1.close()
infileR2.close()
output1.close()
output2.close()
【问题讨论】:
标签: python parsing for-loop conditional biopython