【问题标题】:Merging Files that are partially matched in name in Python合并Python中名称部分匹配的文件
【发布时间】:2020-04-17 19:11:14
【问题描述】:

我有大量文件。我需要根据名称的匹配部分合并这些文件。对于每个 ATGC008.COGXXXX 编号,我必须合并包含名称的 5 个文件中的任何一个:Streptococcus_salivarius 和包含名称 Streptococcus_thermophilus 的 10 个文件中的任何一个,然后从这些合并中生成一个新文件。包含文件及其命名方案的目录类似于this,单个文件类似于this

如果有任何解决方案需要,我可以很容易地将文件转换为 .fasta 格式。

我最初采用了一种更加手动的方法来在 python 中组合文件,将它们的名称复制并粘贴到脚本中,但是需要完成 872 种组合,这很快就变得非常琐碎。因此,我尝试通过直接从包含文件的目录中绘制来自动化该过程并产生了这个:

import os
os.chdir("F:\PostGrad_Research\Programming_Files\Dumped_Files\Phylip_OG")
strainsToMerge = ['Streptococcus_thermophilus', 'Streptococcus_salivarius']
              #a list fo the strains that you want to access
for cogNumber in range(maxCogNumber):
    for i in range(2):
            filename = open('ATGC008.COG'+str(cogNumber)+ '.phy','w') #construct the filename to access the file
    infile = open(filename,'r')          #open the file
    sequences = infile.read()            #read the file
    subSequences = re.split('\w+',sequences)     #split the file with the header
    firstSequence[i] = subSequences[i].strip()   #extract the first sequence and make sure you've got rid of the whitespace at the start and end
    firstSequence[i] = re.sub('[\r\n]','',firstSequence[i])

outfile = open('ATGC008.COG'+str(cogNumber)+ '.phy','w')
    outfile.write('2 '+str(len(firstSequence[0])+'\n'))
    for i in range(2):
            outfile.write(firstSequence[i]+'\n'))
    outfile.close()

我收到以下错误:

Traceback (most recent call last):
File "F:\PostGrad_Research\Programming_Files\Merge_Fasta.py", line 8, in <module>
for cogNumber in range(max.cogNumber):
AttributeError: 'builtin_function_or_method' object has no attribute 'cogNumber'

我们将不胜感激任何和所有的帮助/建议。

【问题讨论】:

  • 您需要在引号中提及 cogNumber。像 str('cogNumber')。否则python将其视为变量。
  • outfile.write(firstSequence[i]+'\n')) ---> 请更正此问题,额外的右括号
  • @DiwakarSHARMA,感谢您指出这个错误,我确实有点尴尬。
  • 您好@Joshua,感谢您为我看这个。我已经按照你的建议做了,现在出现错误:NameError: name 'maxcogNumber' is not defined,你还有什么想法吗?谢谢——
  • @James 您在 for 循环中提到了“maxcogNumber”,但您似乎没有为它赋值。

标签: python python-3.x merge


【解决方案1】:
   import os
    os.chdir("F:\PostGrad_Research\Programming_Files\Dumped_Files\Phylip_OG")
    strainsToMerge = ['Streptococcus_thermophilus', 'Streptococcus_salivarius']
                  #a list fo the strains that you want to access
    maxCogNumber=1092
    for cogNumber in range(1,maxCogNumber+1):
        a=''
        if 1<=cogNumber<10:
            a='000'+str(cogNumber)
        elif 10<=cogNumber<100:
            a='00'+str(cogNumber)
        elif 100<=cogNumber<1000:
            a='0'+str(cogNumber)
        else:
            a=str(cogNumber)
        for i in range(2):
                filename = open('ATGC008.COG'+a+ '.phy','w') #construct the filename to access the file
        infile = open(filename,'r')          #open the file
        sequences = infile.read()            #read the file
        subSequences = re.split('\w+',sequences)     #split the file with the header
        firstSequence[i] = subSequences[i].strip()   #extract the first sequence and make sure you've got rid of the whitespace at the start and end
        firstSequence[i] = re.sub('[\r\n]','',firstSequence[i])

    outfile = open('ATGC008.COG'+a+ '.phy','w')
        outfile.write('2 '+str(len(firstSequence[0])+'\n'))
        for i in range(2):
                outfile.write(firstSequence[i]+'\n'))
        outfile.close()

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-03-16
    • 2018-02-03
    相关资源
    最近更新 更多