google app engine + python：上传到blobstore会导致编码错误答案

【问题标题】：google app engine + python: uploading to blobstore causes wrong encodinggoogle app engine + python：上传到blobstore会导致编码错误
【发布时间】：2014-05-30 13:58:28
【问题描述】：

我尝试使用以下 HTML 表单将 blob 上传到 Google App Engine 的 blobstore：

<!DOCTYPE html>
<html>
<head>
<meta charset=utf-8>
</head>
<body>
<form id=upload action={{upload_url}} method=post enctype=multipart/form-data>
  Name: <input type=text name=name>
  Your photo: <input type=file name=image required=required><br><br>
  <input type=submit value=submit>
</form>
</body>
</html>

模板变量{{upload_url}}的值是通过服务器端的upload_url = blobstore.create_upload_url('/upload')获取的。后处理脚本如下：

    class Test(ndb.Model):
        name = StringProperty()
        image = StringProperty()

    test = Test()
    test.name = self.request.get('name')
    image = self.get_uploads('image')[0]
    test.image = str(image.key())
    test.put()

通常，name 字段将填充非英文字符（例如中文）。上述程序在我的本地 SDK 上运行良好。但是，当程序在 Google App Engine 上运行时，name 的编码不正确。那有什么问题呢？

【问题讨论】：

尝试：test.name = self.request.get('name').decode('utf-8')
好吧，错误信息：UnicodeEncodeError: 'ascii' codec can't encode character u'\u6211' in position 0: ordinal not in range(128)
你可以尝试不带upload_url和redirect的方式上传，找出编码问题。看看这个要点中的 gcs_upload.py：gist.github.com/voscausa/9541133

标签： python google-app-engine blobstore

【解决方案1】：

您不必在元标记参数周围加上引号：<meta charset="UTF-8">？另外，请尝试：<meta http-equiv="content-type" content="text/html; charset=utf-8" />。并且，请确保您以 UTF-8 编码保存模板的文本文档。

【讨论】：

谢谢。但是 HTML5 允许不使用引号。所以，为了效率，我觉得还是减少HTML文件的大小比较好。

【解决方案2】：

刚刚发现这是一个多年的老错误，请参阅here。有两种解决方案：

(1) 在app.yaml中加入如下语句：

libraries:
- name: webob
  version: "1.2.3"

(2) 添加文件appengine_config.yaml，内容如下：

# -*- coding: utf-8 -*-
from webob import multidict

def from_fieldstorage(cls, fs):
    """Create a dict from a cgi.FieldStorage instance.
    See this for more details:
    http://code.google.com/p/googleappengine/issues/detail?id=2749
    """
    import base64
    import quopri

    obj = cls()
    if fs.list:
        # fs.list can be None when there's nothing to parse
        for field in fs.list:
            if field.filename:
                obj.add(field.name, field)
            else:
                # first, set a common charset to utf-8.
                common_charset = 'utf-8'
                # second, check Content-Transfer-Encoding and decode
                # the value appropriately
                field_value = field.value
                transfer_encoding = field.headers.get('Content-Transfer-Encoding', None)
                if transfer_encoding == 'base64':
                    field_value = base64.b64decode(field_value)
                if transfer_encoding == 'quoted-printable':
                    field_value = quopri.decodestring(field_value)
                if field.type_options.has_key('charset') and field.type_options['charset'] != common_charset:
                    # decode with a charset specified in each
                    # multipart, and then encode it again with a
                    # charset specified in top level FieldStorage
                    field_value = field_value.decode(field.type_options['charset']).encode(common_charset)
                    # TODO: Should we take care of field.name here?
                    obj.add(field.name, field_value)
    return obj

multidict.MultiDict.from_fieldstorage = classmethod(from_fieldstorage)

【讨论】：