有没有更好的方法来使用 urlopen 执行 csv/namedtuple？答案

【问题标题】：Is there a better way to do csv/namedtuple with urlopen?有没有更好的方法来使用 urlopen 执行 csv/namedtuple？
【发布时间】：2013-05-04 13:38:46
【问题描述】：

使用namedtuple 文档示例作为我在 Python 3.3 中的模板，我有以下代码来下载 csv 并将其转换为一系列 namedtuple 子类实例：

from collections import namedtuple
from csv import reader
from urllib.request import urlopen    

SecurityType = namedtuple('SecurityType', 'sector, name')

url = 'http://bsym.bloomberg.com/sym/pages/security_type.csv'
for sec in map(SecurityType._make, reader(urlopen(url))):
    print(sec)

这会引发以下异常：

Traceback (most recent call last):
  File "scrap.py", line 9, in <module>
    for sec in map(SecurityType._make, reader(urlopen(url))):
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

我知道问题在于 urlopen 返回的是字节而不是字符串，并且我需要在某个时候对输出进行解码。这是我现在的做法，使用 StringIO：

from collections import namedtuple
from csv import reader
from urllib.request import urlopen
import io

SecurityType = namedtuple('SecurityType', 'sector, name')

url = 'http://bsym.bloomberg.com/sym/pages/security_type.csv'
reader_input = io.StringIO(urlopen(url).read().decode('utf-8'))

for sec in map(SecurityType._make, reader(reader_input)):
    print(sec)

这听起来很有趣，因为我基本上是遍历字节缓冲区、解码、重新缓冲，然后遍历新的字符串缓冲区。有没有更 Pythonic 的方法可以在没有两次迭代的情况下做到这一点？

【问题讨论】：

嗯。我要推荐的（使用TextIOWrapper）没有用，IMO 应该有。一些挖掘发现this bug (issue 16723)，这似乎是问题所在。
@DSM：该补丁已在 2 月应用，因此最新的 3.x 版本包含它。您使用的是哪个版本的 3.x？
3.3.0.我喜欢保持最新状态，但距离 3.3.1 发布还不到一个月。 :^)

标签： python csv python-3.x namedtuple

【解决方案1】：

使用io.TextIOWrapper() 解码urllib 响应：

reader_input = io.TextIOWrapper(urlopen(url), encoding='utf8', newline='')

现在csv.reader 被传递完全相同的接口，它在以文本模式打开文件系统上的常规文件时会得到。

通过此更改，您的示例 URL 在 Python 3.3.1 上适用于我：

>>> for sec in map(SecurityType._make, reader(reader_input)):
...     print(sec)
... 
SecurityType(sector='Market Sector', name='Security Type')
SecurityType(sector='Comdty', name='Calendar Spread Option')
SecurityType(sector='Comdty', name='Financial commodity future.')
SecurityType(sector='Comdty', name='Financial commodity generic.')
SecurityType(sector='Comdty', name='Financial commodity option.')
...
SecurityType(sector='Muni', name='ZERO COUPON, OID')
SecurityType(sector='Pfd', name='PRIVATE')
SecurityType(sector='Pfd', name='PUBLIC')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')
SecurityType(sector='', name='')

最后几行似乎产生了空元组；原版确实有几行，上面只有一个逗号。

【讨论】：

见我上面的注释——虽然这个应该有效，但我不认为它真的有效，因为一个错误。
@DSM：该错误已解决；这在 3.3.1 中对我有用。使用此更改运行 OP 示例没有问题。