Biopython：将蛋白质片段从 PDB 导出到 FASTA 文件答案

【问题标题】：Biopython: export the protein fragment from PDB to a FASTA fileBiopython：将蛋白质片段从 PDB 导出到 FASTA 文件
【发布时间】：2020-05-22 23:49:58
【问题描述】：

我正在将 PDB 蛋白质序列片段写入 fasta 格式，如下所示。

from Bio.SeqIO import PdbIO, FastaIO

def get_fasta(pdb_file, fasta_file, transfer_ids=None):
    fasta_writer = FastaIO.FastaWriter(fasta_file)
    fasta_writer.write_header()
    for rec in PdbIO.PdbSeqresIterator(pdb_file):
        if len(rec.seq) == 0:
            continue
        if transfer_ids is not None and rec.id not in transfer_ids:
            continue
        print(rec.id, rec.seq, len(rec.seq))
        fasta_writer.write_record(rec)

get_fasta(open('pdb1tup.ent'), open('1tup.fasta', 'w'), transfer_ids=['1TUP:B'])
get_fasta(open('pdb1olg.ent'), open('1olg.fasta', 'w'), transfer_ids=['1OLG:B'])
get_fasta(open('pdb1ycq.ent'), open('1ycq.fasta', 'w'), transfer_ids=['1YCQ:B'])

它给出了以下错误

AttributeError                            Traceback (most recent call last)
<ipython-input-9-8ecf92753ac9> in <module>
     12         fasta_writer.write_record(rec)
     13 
---> 14 get_fasta(open('pdb1tup.ent'), open('1tup.fasta', 'w'), transfer_ids=['1TUP:B'])
     15 get_fasta(open('pdb1olg.ent'), open('1olg.fasta', 'w'), transfer_ids=['1OLG:B'])
     16 get_fasta(open('pdb1ycq.ent'), open('1ycq.fasta', 'w'), transfer_ids=['1YCQ:B'])

<ipython-input-9-8ecf92753ac9> in get_fasta(pdb_file, fasta_file, transfer_ids)
     10             continue
     11         print(rec.id, rec.seq, len(rec.seq))
---> 12         fasta_writer.write_record(rec)
     13 
     14 get_fasta(open('pdb1tup.ent'), open('1tup.fasta', 'w'), transfer_ids=['1TUP:B'])

~/anaconda3/envs/bioinformatics/lib/python3.7/site-packages/Bio/SeqIO/FastaIO.py in write_record(self, record)
    303     def write_record(self, record):
    304         """Write a single Fasta record to the file."""
--> 305         assert self._header_written
    306         assert not self._footer_written
    307         self._record_written = True

AttributeError: 'FastaWriter' object has no attribute '_header_written'

我四处搜索并检查了this、this 和this，但无法解决问题。完整的代码是here，问题出在最后一个单元格中。

编辑：我正在使用

conda version : 4.8.3
conda-build version : 3.18.11
python version : 3.7.6.final.0
biopython version : 1.77.dev0

【问题讨论】：

有趣的问题。我无法在我的机器上重现它。该代码使用 Python 3.6.9 和 biopython==1.76 运行良好。通过查看 Biopython 的源代码，我看不到字段 header_written 可能_not 存在。您使用的是哪个 biopython 版本？
@LydiavanDyke 我正在使用 biopython==1.77dev0。 1.76 的问题是 SwissProt fatureTable 格式发生了变化，在 1.77Dev0 中更新。
我明白了。目前我最好的猜测是：在 1.76 和 1.77dev 之间出现了一个错误。我建议您尝试使用 1.76 重现该错误。如果它在旧版本中消失，我建议对 biopython 提交错误报告。
@LydiavanDyke 谢谢你，你是对的，它在 biopython 1.76 中运行良好。我在github上报告了一个问题。
感谢您报告错误。祝你的项目好运，编码愉快:)

标签： python biopython

【解决方案1】：

我不确定我不使用的fasta_writer，但您可以将所需的字符串序列存储到list 或dict，然后手动将它们写入fasta：

## with list
data = '>'+'\n>'.join([f'{i}\n{seq}' for i, seq in enumerate(seq_list)])+'\n'
## or with dict
data = '>'+'\n>'.join([f'{name}\n{seq}' for name, seq in seq_dict.iteritems()])+'\n' 

with open('path/to/my-fasta-file.fasta', 'wt') as f:
    f.write(data)

（data 末尾的新行仅当这是您将批量 seq_list 写入同一个 fasta 文件的较大循环的一部分时才需要）

【讨论】：

谢谢。这是一个错误，已由相关团队在 GitHub 上修复。新版本兼容这两个选项

【解决方案2】：

你可以用 Biopython 做到这一点

from Bio import SeqIO
pdbfile = '2tbv.pdb'
with open(pdbfile) as handle:
    sequence = next(SeqIO.parse(handle, "pdb-atom"))
with open("2tbv.fasta", "w") as output_handle:
    SeqIO.write(sequence, output_handle, "fasta")

【讨论】：

谢谢。这是一个错误，已由相关团队在 GitHub 上修复。新版本兼容这两个选项