【问题标题】:Redirecting python's stdout to the file fails with UnicodeEncodeError将 python 标准输出重定向到文件失败并出现 UnicodeEncodeError
【发布时间】:2013-10-09 07:47:21
【问题描述】:

我有一个 python 脚本,它连接到 Twitter Firehose 并将数据发送到下游进行处理。在它工作正常之前,但现在我试图只获取文本正文。 (这不是关于我应该如何从 Twitter 中提取数据或如何编码/解码 ascii 字符的问题)。所以当我像这样直接启动我的脚本时:

python -u fetch_script.py

它工作得很好,我可以看到屏幕上出现了消息。例如:

root@domU-xx-xx-xx-xx:/usr/local/streaming# python -u fetch_script.py 
Cuz I'm checking you out >on Facebook<
RT @SearchlightNV: #BarryLies???????? has crapped on all honest patriotic hard-working citizens in the USA but his abuse of WWII Vets is sick #2A…
"Why do men chase after women? Because they fear death."~Moonstruck
RT @SearchlightNV: #BarryLies???????? has crapped on all honest patriotic hard-working citizens in the USA but his abuse of WWII Vets is sick #2A…
Never let anyone tell you not to chase your dreams. My sister came home crying today, because someone told her she's not good enough.
"I can't even ask anyone out on a date because if it doesn't end up in a high speed chase, I get bored."
RT @ColIegeStudent: Double-checking the attendance policy while still in bed
Well I just handed my life savings to ya.. #trustingyou #abouttomakebankkkkk
Zillow $Z and Redfin useless to Wells Fargo Home Mortgage, $WFC, and FannieMae $FNM. Sale history LTV now 48%, $360 appraisal fee 4 no PMI.
The latest Dump and Chase Podcast http://somedomain.com/viaRSA9W3i check it out and subscribe on iTunes, or your favorite android app #Isles

但如果我尝试像这样将它们输出到文件中:

python -u fetch_script.py >fetch_output.txt

它立即抛出一个错误:

root@domU-xx-xx-xx-xx:/usr/local/streaming# python -u fetch_script.py >fetch_output.txt
ERROR:tornado.application:Uncaught exception, closing connection.
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tornado/iostream.py", line 341, in wrapper
    callback(*args)
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped
    raise_exc_info(exc)
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/usr/local/streaming/twitter-stream.py", line 203, in parse_json
    self.parse_response(response)
  File "/usr/local/streaming/twitter-stream.py", line 226, in parse_response
    self._callback(response)
  File "fetch_script.py", line 57, in callback
    print msg['text']
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128)
ERROR:tornado.application:Exception in callback <functools.partial object at 0x187c2b8>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 458, in _run_callback
    callback()
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped
    raise_exc_info(exc)
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tornado/iostream.py", line 341, in wrapper
    callback(*args)
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped
    raise_exc_info(exc)
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/usr/local/streaming/twitter-stream.py", line 203, in parse_json
    self.parse_response(response)
  File "/usr/local/streaming/twitter-stream.py", line 226, in parse_response
    self._callback(response)
  File "fetch_script.py", line 57, in callback
    print msg['text']
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128)

附言

更多上下文:

callback 函数发生错误:

def callback(self, message):
        if message:
            msg = message
            msg_props = pika.BasicProperties()
            msg_props.content_type = 'application/text'
            msg_props.delivery_mode = 2
            #print self.count
            print msg['text']
            #self.count += 1
            ...

但是如果我删除 ['text'] 并且只活在print msg 这两种情况都像一个魅力。

【问题讨论】:

  • 使用简单的脚本也会遇到同样的问题:print u'\u2026',所以不用担心添加上下文!问题是python在您写入终端时设置输出编码,而不是在您写入文件时设置输出编码。我不确定目前修复它的最佳做法是什么,并且对答案很感兴趣。
  • 这是一个好点,必须谷歌它,但为什么我将整个有效负载插入文件时没有问题???就像我在 P.S 部分解释的那样。
  • 那是因为你打印了字典的字符串表示。 print {'text':u'\2026'} 输出{'text': u'\x826'},即打印转义的 unicode 字符的 ascii 视图。

标签: python linux file io-redirection output-redirect


【解决方案1】:

既然还没有人跳进去,那我就开枪了。 Python 在写入控制台时设置标准输出的编码,但在写入文件时不设置。这个脚本重现了这个问题:

import sys

msg = {'text':u'\2026'}
sys.stderr.write('default encoding: %s\n' % sys.stdout.encoding)
print msg['text']

上面运行的时候报错:

$ python bad.py>/tmp/xxx
default encoding: None
Traceback (most recent call last):
  File "fix.py", line 5, in <module>
    print msg['text']
UnicodeEncodeError: 'ascii' codec can't encode character u'\x82' in position 0: ordinal not in range(128)

将编码添加到上述脚本中:

import sys

msg = {'text':u'\2026'}
sys.stderr.write('default encoding: %s\n' % sys.stdout.encoding)
encoding = sys.stdout.encoding or 'utf-8'
print msg['text'].encode(encoding)

问题解决了:

$ python good.py >/tmp/xxx
default encoding: None
$ cat /tmp/xxx
6

【讨论】:

  • 伙计你摇滚!太感谢了! ))我正在打破我的头如何做到这一点。
  • 这不是小事 :) 谢谢。你的回答为我节省了很多时间
  • 这真的很有用。
  • 谢谢,这适用于另一种情况,我使用的是 ASCII 颜色代码。
最近更新 更多