【发布时间】:2021-01-03 20:36:30
【问题描述】:
以下代码是分析FASTA序列(.faa文件)的氨基酸组成
from Bio import SeqIO
from Bio.SeqUtils.ProtParam import ProteinAnalysis
import fastaparser
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.max_columns', None)
filename = input("Please enter the full path of the amino acid sequence file!: ")
pH_input = input("At which pH should the analysis be conducted? ")
flexibility_ynu = input("Do you wish a flexibility analysis?\n (1) Yes\n (0) No\n")
if pH_input == "":
pH= 7.4
elif pH_input != "":
pH = pH_input
f = open(filename + "_analysis.txt","w+")
for record in SeqIO.parse(filename, 'fasta'):
X = ProteinAnalysis(str(record.seq))
print("ANALYSIS OF", record, "\n ----------- \n -----------", file=f)
#
pd_count_amino_acids = pd.DataFrame(X.count_amino_acids(), index=[1])
print("number of amino acids: \n",pd_count_amino_acids , file=f)
plt_acc = pd_count_amino_acids.plot.bar()
plt.savefig(filename + "_count_amino_acids_plot.pdf")
#
pd_get_amino_acids_percent = pd.DataFrame(X.get_amino_acids_percent(), index=[1])
print("\n percentage of amino acids: \n", pd_get_amino_acids_percent, file=f)
plt_acp = pd_get_amino_acids_percent.plot.bar()
plt.savefig(filename + "amino_acids_percent_plot.pdf")
#
print("\n molecular weight: {:.2f}".format(X.molecular_weight()), file=f)
print("\n aromaticity: {:.2f}".format(X.aromaticity()), file=f)
print("\n instability index: {:.2f}".format(X.instability_index()), file=f)
if flexibility_ynu == "1":
print("\n flexibility: ", X.flexibility(), file=f)
print("\n IEP: ", X.isoelectric_point(), file=f)
print("therefore its charge at pH = ",pH," is {:.2f}".format(X.charge_at_pH(pH)), file=f)
print("secondary structure fraction: (Helix, Turn, Sheet): ", X.secondary_structure_fraction(), "\n\n\n", file=f)
f.close()
print("done")
我现在想要为氨基酸的绝对数量和相对数量绘制条形图,但要为每个 FASTA ID 创建一个独特的图。
例如,NC_011544 有 5 个 ID,所以我想生成 10 个唯一图(每个 ID 2 个,一个用于绝对数字,一个用于相对数字)。
有什么办法吗?
NC_011544.faa
>gi|212671454|ref|YP_002308464.1| replicase [Hosta virus X]
MARLREVFSSFTEPNLKTIVQQETYKLAKAELKTIQTYNPYAQTKDAADLLEDLGINTNPHAVTAHTHAA
AKSIENDLYGITSHYLPKTPITFLFMKRGKLQFFKRGPQHNDLFFYTTHEPKDVIRYQSEDQTADMFRVP
TSTGFIGDTLHFLSLKYLHRLFLKNPNLNTLYATMVLPPEAMYRMASIYPEIYQIQYQEDGFLYIPGGHG
GAAYFHTYDTLTWLRVGQFQAKEFTAHLPKVGDKGANHLFIIQRADLKTPKYRTFVPRRKWVTLPNIFLP
STQANHLFIIQRADLKTPKYRTFVPRRKWVTSNIFLPKHTNARKPILKQTMMQLFLYEKSVKEITFRDVF
AKIRQLIQTKDLEQFDPDELVRLANYVMHTSKLLEKDPYELIEGQGKLQDLVNPIKTWVSEKWQNWFGWK
DYTRLIRALKWVDVDLVLRVMNTRSTPTGIQTSELLPDEAGPPKSKKKRGGKKIPSPEPSRNCRSKSKRT
RGNRAQREKEPHRRKLRWQKENFQRVTVQVHQAPKGDPSPLARFSQSLKELPRRSQPRRLSKFQDFLMSS
TQTRFQIPSSLNRRAGHWRPKQQGTPPTTQEAGTEGPPTTQPGKPTASSPRAAPQPTANAETMEKGSQAS
SATTRGRDPVTDRTREQAPTNLTPEEEALPWKHWLKQLKAVGFKGNETQMDGDGTSISPIEQIKSCPGKP
KSVSKEILETLRSGHAPNFWKPDASRARAYTSDIKNRRTGAAVHMAPQAWKETMDFIAENAERTLHILRH
PWRRRFREEQMSSRDAHKFHFLFDETLVVCPTNELRRDWIDKLPLSEPGSVLTFERALMNPAKGTVIFDD
YTKLPAGFIEAYSICQPNVELVILTGDAKQASHHESNDNAMIAGLDPAAFEFSKFCRYYLNATHRNPRNL
ANALGIYSEKPGNLKVTFTNHLLPEMHILVPSLLKKATLEELGHKCSTYAGCQGVTLSKVQIYLDSNTTL
CSNEVLYTALSRAVEQINFVNSGPFNGPFWAKLEATPYLKTFLRLTREEKINEITPEEPKPKEPEPPKTH
FPVETSAHLYSSITEEMPEKHAREIYNKTHGHTNCVQTDEPLVQMFAHQQAKDEALFWETIEARLRITTS
EANVQELNEKRDIGDLLFHAYHKAMGLPKDPIPFENDLWETCAQEVQQTYLSKPINLIKNGEKRQGPDFD
KNAIMLFLKSQWVKKMEKLGAPTIKPGQTIASFHQITVMLYGTMARYMRRIRDRFCPKHILINCEKTPTQ
ISDFVKAQWDFSDFAYANDFTAFDQSQDGAMLQFEIIKAKFHNIPEDIILGYMDIKTNAKIFLGTLAIMR
LTGEGPTFDANTECNIAYTHLRFNVPENVAQVYAGDDSALSKVCPEKDSFKQFADRLTLKSKPQVFPQTQ
GAWAEFCGLLITPRGIIKDPVKLHASWVLATKLGTLQQIKCVNSYGEDLKLSYDLGDHLQELLSESQCRT
HQVTVRELVKFAGKVEKHQAEIRSVANGNIRQLPFFY
>gi|212671455|ref|YP_002308465.1| 26 kDa protein [Hosta virus X]
MATFASFLSSTRPDFERTNTPLTKPLVIHAVAGAGKTTLLRDFLRANPLTNAQTLGTPDCPTLDGAYIRP
FSGPVANLVNILDEYTAHRHGSWDVLIADPLQHYERAKLPHYICKRSHRLCPATARLLRKLGLDIHSYRE
DESEISFSDIFSGQLEGTVLPLTPLCKDLLERHSCPFKCPSEFIGEQDDIITVVSEIPLSKHPDKTALYR
ALTRHTRRLNVLAPPPYPTP
>gi|212671456|ref|YP_002308466.1| 13 kDa protein [Hosta virus X]
MSSPHRLTPPPNYTPVLLAVVIGVGLAVVTNQLTRSTLPHVGDNIHSLPHGGNYKDGTKSVIYRGPAPFQ
RSHSTAPPFNAVLLLTFAIWFLSCRTRRAAIGIHVCHTCSQTREQQ
>gi|212671457|ref|YP_002308467.1| 8 kDa protein [Hosta virus X]
MQSFCSHLRSGSFPVVLGALLLAFTCATLVLRLGNNNSNNCLIYVDGARAFLEGNCAGISAEVVAALRPH
SHAG
>gi|212671458|ref|YP_002308468.1| coat protein [Hosta virus X]
MASDAPTPPAAPSPVTFTAPTQEQLTSLALPIISTRLPSPDVLNQISVKWQELGVPTASISSTAIALCMA
CYHSGSSGSTLIPGLAPGTTVNYTSLAAAVKSLATLREFARYFAPIIWNYAIEHKIPPANWAAMGYKENT
KYAAFDTFDSILNPAALQPTGGLIRQPTEEELLAHQANSALHIFDSLRNDFASTDGRVTRGHITSNVNSL
NYLPAPEGSS
【问题讨论】:
标签: python pandas matplotlib biopython fasta