【发布时间】:2023-04-03 17:32:02
【问题描述】:
我有 3 个文件:
文件 1:
chrM 6423 5
chrM 6432 4
chrM 7575 1
chrM 7670 1
chrM 7933 1
chrM 7984 1
chrM 8123 1
chrM 9944 1
chrM 10434 1
chrM 10998 13
chrM 10999 19
chrM 11024 17
chrM 11025 29
chrM 11117 21
chrM 11118 42
chr1 197095350 2
chr1 197103061 1
chr1 197103582 1
chr1 197103615 1
chr1 197103810 3
chr1 197103885 2
chr1 197104256 1
chr1 197107467 4
chr1 197107480 5
chr1 197107498 6
chr1 197107528 10
chr1 197107805 1
chr1 197107806 1
chr1 197107813 1
chr1 197107814 1
chr1 197107839 1
chr1 197107840 1
chr1 197107855 1
chr1 197107856 1
chr1 197107877 1
chr1 197107878 1
chr1 197111511 1
chr1 197120122 1
chr1 197125503 1
chr1 197126978 1
chr1 197127070 1
chr1 197127084 1
chr1 197129731 2
chr1 197129758 2
chr1 197129765 1
chr1 197167632 2
chr1 197167652 2
chr1 197167668 2
chr1 197167682 2
chr1 197181417 1
chr1 197181973 3
chr1 197181975 3
chr1 197192150 0
文件2:
chrM 6423 5
chrM 6432 4
chrM 6582 1
chrM 6640 1
chrM 6643 1
chrM 7140 1
chrM 10998 7
chrM 10999 8
chrM 11024 10
chrM 11025 13
chrM 11117 12
chrM 11118 33
chr1 197095157 2
chr1 197095185 2
chr1 197098860 1
chr1 197105061 1
chr1 197107422 1
chr1 197107436 1
chr1 197107467 3
chr1 197107480 4
chr1 197107498 3
chr1 197107528 4
chr1 197107805 2
chr1 197107813 2
chr1 197107839 1
chr1 197108557 1
chr1 197108591 1
chr1 197108596 1
chr1 197108617 1
chr1 197108651 1
chr1 197139308 1
chr1 197139335 1
chr1 197143403 1
chr1 197143442 1
chr1 197145546 1
chr1 197148715 1
chr1 197148723 1
chr1 197148731 1
chr1 197148761 1
chr1 197153190 1
chr1 197166831 1
chr1 197166847 2
chr1 197166922 2
chr1 197166950 1
chr1 197166954 1
chr1 197167041 1
chr1 197167778 1
chr1 197167791 1
chr1 197167834 1
chr1 197167857 2
chr1 197167860 2
chr1 197167865 1
chr1 197167867 1
chr1 197167871 1
chr1 197167935 2
chr1 197167946 2
chr1 197167948 2
chr1 197167951 2
chr1 197167974 1
chr1 197167980 1
chr1 197168142 1
chr1 197168163 1
chr1 197168195 1
chr1 197168210 1
chr1 197169548 1
chr1 197169580 1
chr1 197169609 1
chr1 197183318 1
chr1 197183404 1
chr1 197184910 1
chr1 197184937 1
chr1 197186368 1
chr1 197191991 1
chr1 197192031 1
chr1 197192047 1
chr1 197192097 1
chr1 197192106 1
chr1 197192125 1
chr1 197192150 1
文件3:
chrM 6423 2
chrM 6432 1
chrM 6766 1
chrM 6785 1
chrM 10075 1
chrM 10084 1
chrM 10998 7
chrM 10999 8
chrM 11024 7
chrM 11025 14
chrM 11117 8
chr1 197095943 1
chr1 197096144 1
chr1 197104061 1
chr1 197104257 1
chr1 197107805 2
chr1 197122470 1
chr1 197123085 1
chr1 197123093 1
chr1 197126978 1
chr1 197142562 1
chr1 197157076 1
chr1 197157101 2
chr1 197162035 4
chr1 197167431 1
chr1 197167470 1
chr1 197167535 1
chr1 197167652 1
chr1 197167668 1
chr1 197167682 1
chr1 197167715 1
chr1 197167734 1
chr1 197167755 1
chr1 197168107 2
chr1 197168113 2
chr1 197172198 1
chr1 197172211 1
chr1 197172221 1
chr1 197172271 1
chr1 197175787 1
chr1 197175806 1
chr1 197175822 1
chr1 197192150 0
生成的文件应该是这样的:
6423 chrM 2 5 5
6432 chrM 1 4 4
6582 chrM 1
197093370 chr1 1
197093385 chr1 1
197094791 chr1 1
197094813 chr1 1
197094855 chr1 1
197094857 chr1 1
197095157 chr1 2
197095185 chr1 2
197095350 chr1 2
197095943 chr1 1
197096
现在我的代码工作正常。但是在 while 循环中有一个问题,在合并文件的末尾几乎合并了许多记录后,它停止在文件上写入,只写了 197096 .... 并停止并出现错误 Traceback (最近一次通话最后): 文件“”,第 4 行,在 IndexError: 列表索引超出范围
我认为这个错误与 while 循环有关。我不知道它为什么会发生。我也在更改我的代码,如下所示:
看看她的问题:你可以在结果文件中清楚地看到,在这种情况下,正在发生一些事情,即在从单个文件中读取代码后,代码无法从所有文件中读取公共值,而且在这种情况下,它没有给出应该的 7575 7140之后来。
我有多个很大的文件,我想逐行读取它们并将它们合并在一起,如果它们对于第 2 列都有相同的值,我使用了将所有第 2 列 val 取入的逻辑列表,然后找到它们中的最小值。从文件中写入最小值记录(保存在 mycover 中的第 3 列)在新文件中显示最小值。然后跟踪读取的文件以从my_newfile[] 中读取下一行,并删除已写入文件的记录。
希望这足以理解。我不知道如何重复该过程,直到所有文件都到达末尾,以便从所有文件中读取所有记录。我的代码如下:
import sys
import glob
import errno
path = '*Sorted_Coverage.txt'
filenames = glob.glob(path)
files = [open(i, "r") for i in filenames]
p=1
mylist=[]
mychr=[]
mycover=[]
new_mychr=[]
new_mycover=[]
new_mylist=[]
myfile=[]
new_myfile=[]
ab=""
g=1
result_f = open('MERGING_water_onlyselected.txt', 'a')
for j in files:
line = j.readline()
parts = line.split()
mychr.append(parts[0])
mycover.append(parts[2])
mylist.append(parts[1])
myfile.append(j)
mylist=map(int,mylist)
minval = min(mylist)
ind = [i for i, v in enumerate(mylist) if v == minval]
not_ind = [i for i, v in enumerate(mylist) if v != minval]
w=""
j=0
for j in xrange(len(ind)): # writing records to file with minimum value
if(j==0):
ab = (str(mylist[ind[j]])+'\t'+mychr[ind[j]]+'\t'+mycover[ind[j]])
else:
ab=ab+'\t'+mycover[ind[j]]
#smallest written on file
result_f.writelines(ab+'\n')
ab=""
for i in ind:
new_myfile.append(myfile[i])
#removing the records by index which have been used from mylists .
for i in sorted(ind, reverse=True):
del mylist[i]
del mycover[i]
del mychr[i]
del myfile[i]
#how to iterate the following code from all records of all files till the end of each file
while(True):
for i in xrange(len(new_myfile)):
print len(new_myfile)
myfile.append(new_myfile[i])
line = new_myfile[i].readline()
parts = line.split()
mychr.append(parts[0])
mycover.append(parts[2])
mylist.append(parts[1])
new_myfile=[]
mylist=map(int, mylist)
minval = min(mylist)
print minval
print("list values:")
print mylist
ind = [i for i, v in enumerate(mylist) if v == minval]
not_ind = [i for i, v in enumerate(mylist) if v != minval]
k=0
ab=""
for j in xrange(len(ind)): # writing records to file with minimum value
if(j==0):
ab = (str(mylist[ind[j]])+'\t'+str(mychr[ind[j]])+'\t'+str(mycover[ind[j]]))
k=k+1
else:
ab=ab+'\t'+str(mycover[ind[j]])
k=k+1
#smallest written on file
result_f.writelines(ab+'\n')
ab=""
for i in ind:
new_myfile.append(myfile[i])
#removing the records by index which have been used from mylists .
for i in sorted(ind, reverse=True):
del mylist[i]
del mycover[i]
del mychr[i]
del myfile[i]
result_f.close()
我一直在寻找解决方案很多天,但仍然找不到任何解决方案。我不知道这段代码是否可以进一步改进,因为我对 python 还很陌生。
如果有人能提供帮助,我将不胜感激。
【问题讨论】:
-
您能否澄清一下您需要帮助的代码部分?
-
#如何从所有文件的所有记录中迭代以下代码,直到每个文件的末尾:我想添加一些循环,将以下代码一次又一次地迭代到结束,直到所有记录来自所有文件都被读取并合并到 MERGING_TEST3.txt。我给你一些例子,让你更清楚我想要做什么:
-
file1:chrM 135 18 chrM 229 2 chrM 230 18 chrM 263 3 文件 2:chrM 134 1 chrM 135 11 chrM 229 1 chrM 230 15 文件 3:chrM 134 2 chrM 2 9 chr 3M 时我将合并这 3 个文件,结果将是:134 chrM 1 2 135 chrM 18 11 8
-
希望它会更清楚。
-
如果任何文件结束,我也想继续合并。意味着重复下面给定的代码,直到所有文件都被读取直到结束
标签: python