Python读取目录中的文件答案

【问题标题】：Python reading files in a directoryPython读取目录中的文件
【发布时间】：2012-07-31 10:18:29
【问题描述】：

我有一个 .csv，其中 2 列中有 3000 行数据，如下所示：

uc007ayl.1  ENSMUSG00000041439
uc009mkn.1  ENSMUSG00000031708
uc009mkn.1  ENSMUSG00000035491

在另一个文件夹中，我有一个名称如下的图表：

uc007csg.1_nt_counts.txt
uc007gjg.1_nt_counts.txt

您应该注意到这些图表的名称与我的第一列的格式相同

我正在尝试使用 python 来识别具有图形的行并在新的 .txt 文件中打印第二列的名称

这些是我的代码

import csv
with open("C:/*my dir*/UCSC to Ensembl.csv", "r") as f:
reader = csv.reader(f, delimiter = ',')
    for row in reader:
        print row[0]

但就我所能得到的，我被困住了。

【问题讨论】：

标签： python python-2.7

【解决方案1】：

你快到了：

import csv
import os.path
with open("C:/*my dir*/UCSC to Ensembl.csv", "rb") as f:
    reader = csv.reader(f, delimiter = ',')
    for row in reader:
        graph_filename = os.path.join("C:/folder", row[0] + "_nt_counts.txt")
        if os.path.exists(graph_filename):
            print (row[1])

请注意，对os.path.exists 的重复调用可能会减慢进程，尤其是当目录位于远程文件系统上并且文件数量不超过 CSV 文件中的行数时。您可能想改用os.listdir：

import csv
import os

graphs = set(os.listdir("C:/graph folder"))
with open("C:/*my dir*/UCSC to Ensembl.csv", "rb") as f:
    reader = csv.reader(f, delimiter = ',')
    for row in reader:
        if row[0] + "_nt_counts.txt" in graphs:
            print (row[1])

【讨论】：

我试过你的代码，但我收到这个错误，说'import sitecustomize'失败；对 traceback@phihag 使用 -v
我想我发现了错误，我忘了把 .png 放在 .txt 之后，因为那些文件是图表
文件应该在 Python 2.x 上以 'rb' 模式打开。你不需要检查每个文件，一个os.listdir()就足够了
@J.F.Sebastian 你说得对"rb"，已修复。仅当 CSV 文件中的条目数不明显小于目标目录中的文件数时，os.listdir 才会更快。添加了替代解决方案。

【解决方案2】：

首先，尝试查看print row[0] 是否真的给出了正确的文件标识符。

其次，用row[0] 连接文件的路径，并用os.path.exists(path) 检查这个完整路径是否存在（如果文件确实存在）（参见http://docs.python.org/library/os.path.html#os.path.exists）。

如果退出，您可以使用f2.write("%s\n" % row[1] 将行[1]（第二列）写入新文件（当然，首先您必须打开f2 进行写入）。

【讨论】：

【解决方案3】：

好吧，下一步是检查文件是否存在？有几种方法，但我喜欢EAFP 方法。

try:
   with open(os.path.join(the_dir,row[0])) as f: pass
except IOError:
   print 'Oops no file'

the_dir 是文件所在的目录。

【讨论】：

在这种情况下，EAFP 不是一个好主意。文件存在与实际打开文件之间存在根本区别。打开文件通常会触发它实际被预读，需要文件句柄管理（并在 Windows 上锁定），并且需要当前用户可以读取文件。

【解决方案4】：

result = open('result.txt', 'w')
for line in open('C:/*my dir*/UCSC to Ensembl.csv', 'r'):
    line = line.split(',')
    try:
        open('/path/to/dir/' + line[0] + '_nt_counts.txt', 'r')
    except:
        continue
    else:
        result.write(line[1] + '\n')
result.close()

【讨论】：

此实现可能会在某些 Python 实现上泄漏文件句柄，并要求当前用户能够读取该文件。

【解决方案5】：

import csv
import os

# get prefixes of all graphs in another directory
suff = '_nt_counts.txt'
graphs = set(fn[:-len(suff)] for fn in os.listdir('another dir') if fn.endswith(suff))

with open(r'c:\path to\file.csv', 'rb') as f:
    # extract 2nd column if the 1st one is a known graph prefix
    names = (row[1] for row in csv.reader(f, delimiter='\t') if row[0] in graphs)
    # write one name per line
    with open('output.txt', 'w') as output_file:
        for name in names:
            print >>output_file, name

【讨论】：