【发布时间】:2021-05-12 19:29:26
【问题描述】:
我正在解析显示列表索引超出范围的数据文本文件。它适用于某些文件,而不适用于其他一些文本文件。我需要你的帮助来调试这个脚本。
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import sys
import os
import re
from collections import OrderedDict
from numpy import unique
def main():
if len(sys.argv) < 2:
print("usage: python3 {} <bacmat_out_table> > output".format(sys.argv[0]))
sys.exit(1)
bacmat_out = os.path.abspath(sys.argv[1])
class_sum = OrderedDict()
with open(bacmat_out) as fh:
for line in fh:
if re.search(r"^\s*$|^Query", line):
continue
elif len(line) == 0:
break
else:
fields = line.strip().split("\t")
compounds = fields[6]
if re.search(r'\[.*\]', compounds):
compounds_class = re.findall('\[class:\s?(.+?)\]', compounds)
compounds_class = list(unique(compounds_class))
if len(compounds_class) > 0:
for i in compounds_class:
class_sum.setdefault(i, 0)
class_sum[i] += 1
else:
compounds = compounds.strip('"')
compounds = compounds.strip("'")
compounds = compounds.strip()
class_sum.setdefault(compounds, 0)
class_sum[compounds] += 1
print("Class\tCount")
for key in sorted(class_sum.keys()):
print(key, class_sum[key], sep="\t")
enter code here
if __name__ == '__main__':
main()
工作的文件
Query Subject Gene Description Organism Location Compounds Percent identity Match length E-value Score per length
BAC0001|abeM|tr|Q5FAM9|Q5FAM9_ACIBA gi|445995506|ref|WP_000073361.1| abeM "H-coupled multidrug efflux pump. Confers resistance to Antibiotics such as quinolones and aminoglycosides and antibacterial biocides such as dyes, QACs. " Acinetobacter baumannii Chromosome "4,6-diamidino-2-phenylindole (DAPI) [class: Diamidine], Triclosan [class: Phenolic compounds], Acriflavine [class: Acridine], Hoechst 33342 [class: Bisbenzimide], Rhodamine 6G [class: Xanthene], Ethidium Bromide [class: Phenanthridine], Tetraphenylphosphonium (TPP) [class: Quaternary Ammonium Compounds (QACs)]" 100.0 448 1.3e-243 1.87857142857143
BAC0002|abeS|tr|Q2FD83|Q2FD83_ACIBA gi|446043276|ref|WP_000121131.1| abeS "Disinfectant resistance protein abeS. It can confer resistance to antibiotics such as erythromycin, novomycin, amikacin, ciprofloxacin, norfloxacin, tetracycline, trimethoporin and dyes, QACs etc. " Acinetobacter calcoaceticus/baumannii complex Chromosome "Benzylkonium Chloride (BAC) [class: Quaternary Ammonium Compounds (QACs)], Ethidium Bromide [class: Phenanthridine], Acriflavine [class: Acridine], Chlorhexidine [class: Biguanides], Pyronin Y [class: Xanthene], Rhodamine 6G [class: Xanthene], Methyl Viologen [class: Paraquat], Tetraphenylphosphonium (TPP) [class: Quaternary Ammonium Compounds (QACs)], 4,6-diamidino-2-phenylindole (DAPI) [class: Diamindine], Acridine Orange [class: Acridine], Sodium Dodecyl Sulfate (SDS) [class: Organo-sulfate], Sodium Deoxycholate (SDC) [class: Acid], Crystal Violet [class: Triarylmethane], Cetrimide (CTM) [class: Quaternary Ammonium Compounds (QACs)], Cetylpyridinium Chloride (CPC) [class: Quaternary Ammonium Compounds (QACs)], Dequalinium [class: Quaternary Ammonium Compounds (QACs)]" 100.0 109 9.5e-52 1.85504587155963
BAC0003|acn|tr|O53166|O53166_MYCTU gi|489995855|ref|WP_003898889.1| acn "Aconitate hydratase, Acn" Mycobacterium Chromosome Iron (Fe) 100.0 943 0.0e+00 2.03467656415695
BAC0004|acr3|tr|B5LX01|B5LX01_CAMJU gi|488947840|ref|WP_002858915.1| acr3 "Arsenical-resistance membrane transporter; part of the an arsenic (ars) four-gene operon, containing genes encoding a putative membrane permease (ArsP), a transcriptional repressor (ArsR), an arsenate reductase (ArsC) and an arsenical-resistance membrane transporter (Acr3)" Campylobacter Chromosome Arsenic (As) 100.0 347 4.2e-178 1.7971181556196
BAC0005|acrA|sp|P0AE06|ACRA_ECOLI gi|481023858|ref|WP_001295324.1| acrA "AcrAB is a drug efflux protein with a broad substrate specificity. It can confer resistant to ampicillin, chloramphenicol as well. It requires TolC outer memberane protein to function and form the AcrAB-TolC efflux operon. AcrAB-TolC is a drug efflux protein complex with broad substrate specificity that uses the proton motive force to export substrates." Proteobacteria Chromosome "Acriflavine [class: Acridine], Phenol [class: Phenolic compounds], Triclosan [class: Phenolic compounds], p-xylene [class: Aromatic hydrocarbons], Cyclohexane [class: Cycloalkane], Pentane [class: Alkane]" 100.0 397 4.5e-216 1.88916876574307
BAC0006|acrB|sp|P31224|ACRB_ECOLI gi|447055213|ref|WP_001132469.1| acrB "AcrAB is a drug efflux protein with a broad substrate specificity. It can confer resistant to ampicillin, chloramphenicol as well.It requires TolC outer memberane protein to function and form the AcrAB-TolC efflux operon. AcrAB-TolC is a drug efflux protein complex with broad substrate specificity that uses the proton motive force to export substrates." Enterobacteriaceae Chromosome "Acriflavine [class: Acridine], Phenol [class: Phenolic compounds], Triclosan [class: Phenolic compounds], p-xylene [class: Aromatic hydrocarbons], Cyclohexane [class: Cycloalkane], Pentane [class: Alkane]" 100.0 1049 0.0e+00 1.89733079122974
BAC0007|acrC|tr|Q1LMP2|Q1LMP2_RALME gi|499835702|ref|WP_011516436.1| acrC Cation/multidrug efflux system outer membrane porin arcC. Cupriavidus metallidurans Chromosome Acriflavine [class: Acridine] 100.0 486 2.8e-268 1.90061728395062
BAC0563|acrD|tr|Q8ZN77|Q8ZN77_SALTY gi|447185822|ref|WP_001263078.1| acrD Acriflavine resistance protein D; participates in the efflux of aminoglycosides. It confers resistance to a variety of these substances. It contributes to copper and zinc resistance in Salmonella. Salmonella enterica Chromosome "Copper (Cu), Zinc (Zn)" 100.0 1037 0.0e+00 1.90781099324976
无法正常工作的文件
Query Subject Gene Description Organism Location Compounds Percent identity Match length E-value Score per length
ERZ1645190.265-NODE-265-length-2544-cov-3.002812_2 gi|1083034424|gb|OGD35356.1| copB Copper (Cu) Candidatus Atribacteria bacterium RBG_16_35_8 copper-translocating P-type ATPase, partial
80.7 135 2.40e-65 1.56296296296296
ERZ1645190.6825-NODE-6825-length-778-cov-1.752420_2 gi|1133586191|gb|APW63482.1| actP Copper (Cu), Sodium acetate [class: Acetate] Paludisphaera borealis Copper-transporting P-type ATPase
81.4 161 8.72e-78 1.5527950310559
ERZ1645190.14825-NODE-14825-length-656-cov-1.279534_1 gi|1084819878|gb|OGQ54449.1| arrA Arsenic (As) Deltaproteobacteria bacterium RIFCSPLOWO2_02_56_12 dehydrogenase
90.5 63 1.54e-32 1.98412698412698
ERZ1645190.15611-NODE-15611-length-649-cov-1.912458_1 gi|1082733223|gb|OGA52347.1| arrA Arsenic (As) Betaproteobacteria bacterium RIFCSPLOWO2_12_FULL_62_13 dehydrogenase
85.6 216 2.42e-131 1.81018518518519
运行脚本会导致以下错误:
python bacmet_class_summary.py test_bacmet.table > 1.txt
Traceback (most recent call last):
File "bacmet_class_summary.py", line 52, in <module>
main()
File "bacmet_class_summary.py", line 33, in main
compounds = fields[6]
IndexError: list index out of range
这是我在尝试使用第二个示例时遇到的错误
【问题讨论】:
-
您好,欢迎来到 StackOverflow。请提供工作/不工作文件的示例。还要尝试提供最少的可重现代码! :)
-
@JakubSzlaur 非常感谢您的意见。我是stackoverflow的新手?如何附加我的文件?
-
您不必附加它们。只需在您的问题中包含有效文件和无效文件中的 2 或 3 行 :)
-
@JakubSzlaur 我已经复制了两个文件的内容,这些文件适用于脚本或不适用于脚本..
-
我说的只是 每个文件 2 或 3 行。请编辑您的问题。
标签: python python-3.x python-requests