【问题标题】:TypeError: must be str, not bytes in IBM WatsonTypeError:必须是 str,而不是 IBM Watson 中的字节
【发布时间】:2018-01-17 00:37:27
【问题描述】:

我刚完成 CodeAcademyIBM Watson 课程,他们用 python 2 编程,当我把文件带到 python 3 中时,我一直收到这个错误。文件脚本和所有凭据在 CodeAcademy 中运行良好。这是因为我使用的是 Python 3,还是因为代码中的问题。

    Traceback (most recent call last):
  File "c:\Users\Guppy\Programs\PythonCode\Celebrity Match\CelebrityMatch.py", line 58, in <module>
    user_result = analyze(user_handle)
  File "c:\Users\Guppy\Programs\PythonCode\Celebrity Match\CelebrityMatch.py", line 22, in analyze
    text += status.text.encode('utf-8')
TypeError: must be str, not bytes 

有谁知道哪里出错了,代码如下:

import sys
import operator
import requests
import json
import twitter
from watson_developer_cloud import PersonalityInsightsV2 as PersonalityInsights

def analyze(handle):
    twitter_consumer_key = '<key>'
    twitter_consumer_secret = '<secret>'
    twitter_access_token = '<token>'
    twitter_access_secret = '<secret>'

    twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret)

    statuses = twitter_api.GetUserTimeline(screen_name = handle, count = 200, include_rts = False)

    text = ""

    for status in statuses:
        if (status.lang =='en'): #English tweets
            text += status.text.encode('utf-8')

    #The IBM Bluemix credentials for Personality Insights!
    pi_username = '<username>'
    pi_password = '<password>'

    personality_insights = PersonalityInsights(username=pi_username, password=pi_password)
    pi_result = personality_insights.profile(text)
    return pi_result

def flatten(orig):
    data = {}
    for c in orig['tree']['children']:
        if 'children' in c:
            for c2 in c['children']:
                if 'children' in c2:
                    for c3 in c2['children']:
                        if 'children' in c3:
                            for c4 in c3['children']:
                                if (c4['category'] == 'personality'):
                                    data[c4['id']] = c4['percentage']
                                    if 'children' not in c3:
                                        if (c3['category'] == 'personality'):
                                                data[c3['id']] = c3['percentage']
    return data

def compare(dict1, dict2):
    compared_data = {}
    for keys in dict1:
        if dict1[keys] != dict2[keys]:
                compared_data[keys]=abs(dict1[keys] - dict2[keys])
    return compared_data

user_handle = "@itsguppythegod"
celebrity_handle = "@giselleee_____"

user_result = analyze(user_handle)
celebrity_result = analyze(celebrity_handle)

user = flatten(user_result)
celebrity = flatten(celebrity_result)

compared_results = compare(user, celebrity)

sorted_result = sorted(compared_results.items(), key=operator.itemgetter(1))

for keys, value in sorted_result[:5]:
    print(keys, end = " ")
    print(user[keys], end = " ")
    print ('->', end - " ")
    print (celebrity[keys], end = " ")
    print ('->', end = " ")
    print (compared_results[keys])

【问题讨论】:

  • .encode 在 Python 中转换为 bytes()
  • 您以str 对象的text = "" 开始text。要么使用 text = b"" 将其作为 bytes 对象,要么在连接其他 str 对象时不要使用 .encode()
  • 这取决于PersonalityInsights 的实现,你选择哪一个;如果它支持在 Python 3 中处理 str(unicode 文本),那么就坚持使用它而不是对所有内容进行编码。
  • 另外,Python 2 和 3 的语法略有不同,因此如果直接移植代码,几乎总能预料到会有问题。
  • 查看relevant Python code 我看到requests 正在被使用,并且在这种情况下未编码的Unicode 文本 将被编码为Latin-1。所以给自己编码是个好主意,但确实应该作为最后一步,所以当调用.profile()

标签: python ibm-cloud typeerror watson


【解决方案1】:

您在这里创建了一个str(unicode 文本)对象:

text = ""

然后继续追加 UTF-8 编码字节:

text += status.text.encode('utf-8')

在 Python 2 中,"" 创建了一个字节串,这一切都很好(尽管您随后将 UTF-8 字节发布到将其全部解释为 Latin-1 的服务,请参阅 API documentation

要解决此问题,不要编码状态文本,直到您收集完所有推文。此外,告诉 Watson 期待 UTF-8 数据。最后但并非最不重要的一点是,您应该首先构建一个 twitter 文本列表,然后在一个步骤中将它们与str.join() 连接起来,因为在循环中连接字符串需要二次时间:

text = []

for status in statuses:
    if (status.lang =='en'): #English tweets
        text.append(status.text)

# ...

personality_insights = PersonalityInsights(username=pi_username, password=pi_password)
pi_result = personality_insights.profile(
    ' '.join(text).encode('utf8'),
    content_type='text/plain; charset=utf-8'
)

【讨论】:

  • 非常感谢,你真的帮助了我
猜你喜欢
  • 2018-09-13
  • 2020-12-03
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多