解压gzip文件时出现IOError答案

【问题标题】：IOError when decompressing gzip file解压gzip文件时出现IOError
【发布时间】：2015-09-01 12:23:30
【问题描述】：

我正在尝试下载并解压缩一个 gzip 文件，然后将生成的 tsv 格式的解压缩文件转换为更易于解析的 CSV 格式。我正在尝试从this URL 中的"Download Table" link 收集数据。我的代码如下，我使用与in this post 相同的想法，但是在outfile.write(decompressedFile.read()) 行中出现错误IOError: Not a gzipped file。我的代码如下：

import os
import urllib2 
import gzip
import StringIO

baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?"
filename = "D:\Sidney\irt_euryld_d.tsv.gz" #Edited after heinst's comment below
outFilePath = filename[:-3]

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())

compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') 

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

#Now have to deal with tsv file
import csv

with open(outFilePath,'rb') as tsvin, open('ECB.csv', 'wb') as csvout:
    tsvin = csv.reader(tsvin, delimiter='\t')
    csvout = csv.writer(csvout) #Converting output into CSV Format

【问题讨论】：

您可能应该输出compressedFile 的前几个字节并检查它实际上看起来像一个zip 文件。这里可能发生了一些事情，一种可能是服务器给你一个错误页面，因为你的下载请求缺少请求参数或 cookie，或者它不喜欢用户代理。附带说明一下，我强烈建议您使用 Requests 包 (docs.python-requests.org/en/latest) 而不是 urllib2。
对 Windows 路径使用原始字符串：filename = r"D:\Sidney\irt_euryld_d.tsv.gz"。在这里不会有任何区别，而是对安全性的一般性评论。
@cdarke 谢谢。但是，我仍然收到错误消息。

标签： python

【解决方案1】：

基本上你试图拉一个错误的文件检查代码中的响应时，您会收到一个错误的 html 页面您正在尝试将自己的路径添加到导致错误 url 的 url

import os
import urllib2 
import gzip
import StringIO

baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file="
filename = "data/irt_euryld_d.tsv.gz" #Edited after heinst's comment below
outFilePath = filename.split('/')[1][:-3]
response = urllib2.urlopen(baseURL + filename)
print response
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())

compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') 

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

#Now have to deal with tsv file
import csv

with open(outFilePath,'rb') as tsvin, open('ECB.csv', 'wb') as csvout:
    tsvin = csv.reader(tsvin, delimiter='\t')
    csvout = csv.writer(csvout) #Converting output into CSV Format

区别在于文件名的行和对 baseURL 的一个小补充文件名 = “数据/irt_euryld_d.tsv.gz” 根据您指定的链接，这是正确的文件名

另一个变化是这一行 outFilePath = filename.split('/')[1][:-3]

最好写成

outFilePath = os.join('D:','Sidney',filename.split('/')[1][:-3])

【讨论】：

谢谢。只是一个问题，你的意思是outFilePath 而不是outFileName = os.join('D:','Sidney',filename.split('/')[1][:-3]) 中的outFileName。