【问题标题】:Twitter data to csv, getting error when trying to add to CSV fileTwitter 数据到 csv,尝试添加到 CSV 文件时出错
【发布时间】:2016-12-17 07:18:29
【问题描述】:

尝试将最近 24 小时的数据放入 CSV 文件并使用 tweepy for python

Traceback (most recent call last):
File "**", line 74, in <module>
get_all_tweets("BQ")
File "**", line 66, in get_all_tweets
writer.writerows(outtweets)
File "C:\Users\Barry\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-7: character maps to <undefined>

作为一个错误,任何人都可以看到有什么问题,因为这在今天之前以某种身份工作。

代码: def get_all_tweets(screen_name):

# authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_token, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)

# initialize a list to hold all the tweepy Tweets
alltweets = []    

# make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.home_timeline (screen_name=screen_name, count=200)

# save most recent tweets
alltweets.extend(new_tweets)

# save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

outtweets = []

page = 1
deadend = False


print ("getting tweets before %s" % (oldest))

# all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.home_timeline(screen_name=screen_name, count=200, max_id=oldest, page=page)

# save most recent tweets
alltweets.extend(new_tweets)

# update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

print ("...%s tweets downloaded so far" % (len(alltweets)))

for tweet in alltweets:

    if (datetime.datetime.now() - tweet.created_at).days < 1:
        # transform the tweepy tweets into a 2D array that will populate the csv    
        outtweets.append([tweet.user.name, tweet.created_at, tweet.text.encode("utf-8")])

    else:
        deadend = True
        return
    if not deadend:
        page += 1

# write the csv    
with open('%s_tweets.csv' % screen_name, 'w') as f:
    writer = csv.writer(f)
    writer.writerow(["name", "created_at", "text"])
    writer.writerows(outtweets)
pass


print ("CSV written")

if __name__ == '__main__':
# pass in the username of the account you want to download
get_all_tweets("BQ")

** 编辑 1 **

 with open('%s_tweets.csv' % screen_name, 'w', encode('utf-8')) as f:
 TypeError: an integer is required (got type bytes)

** 编辑 2**

 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
 UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-7: character maps to <undefined>

【问题讨论】:

    标签: python csv twitter tweepy


    【解决方案1】:

    您的问题在于某些推文中的字符。您无法将它们写入您打开的文件。 如果你替换这一行

    with open('%s_tweets.csv' % screen_name, 'w') as f:
    

    用这个:

    with open('%s_tweets.csv' % screen_name, mode='w', encoding='utf-8') as f:
    

    它应该工作。请注意,这只适用于 python 3.x

    【讨论】:

    • 你可以试试:with open('%s_tweets.csv' % screen_name, mode='w', encoding='utf-8') as f:
    • 不客气 :) 我将编辑原始帖子以使其对其他人有用
    【解决方案2】:

    似乎字符是无法编码成utf-8的东西。虽然查看触发错误的相关推文可能很有用,但您可以通过将tweet.text.encode("utf-8") 更改为tweet.text.encode("utf-8", "ignore")tweet.text.encode("utf-8", "replace")tweet.text.encode("utf-8", "backslashreplace") 来防止将来出现此类错误。 ignore 删除任何无法编码的内容; replace 将侵权字符替换为\ufff;并且backslashreplace 将反斜杠添加到侵权字符\x00 将变为\\x00

    更多信息:https://docs.python.org/3/howto/unicode.html#converting-to-bytes

    【讨论】:

    • 在我尝试这个之后得到同样的错误,将回溯发布为编辑 2
    • 你把你的代码改成什么了?另外,你能告诉我们有问题的推特帖子吗?这样我们就可以在我们这边运行测试,看看什么是有效的。
    猜你喜欢
    • 2023-03-31
    • 1970-01-01
    • 2020-01-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多