【发布时间】:2015-05-15 10:46:31
【问题描述】:
和this tutorial一起学习python
问题是当我尝试获取西里尔字符时,我在 pycharm 控制台中得到了 unicode。
import requests
from bs4 import BeautifulSoup
import operator
import codecs
def start(url):
word_list = []
source_code = requests.get(url).text
soup = BeautifulSoup(source_code)
for post_text in soup.findAll('a', {'class': 'b-tasks__item__title js-set-visited'}):
content = post_text.string
words = content.lower().split()
for each_word in words:
word_list.append(each_word)
clean_up_list(word_list)
def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
symbols = "!@#$%^&*()_+{}|:<>?,./;'[]\=-\""
for i in range(0, len(symbols)):
word = word.replace(symbols[i], "")
if len(word) > 0:
clean_word_list.append(word)
create_dictionary(clean_word_list)
def create_dictionary(clean_word_list):
word_count = {}
for word in clean_word_list:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
print(key, value)
当我将 print(key, value) 更改为 print(key.decode('utf8'), value) 时,我收到“UnicodeEncodeError: 'ascii'编解码器无法对位置 0-7 中的字符进行编码:序数不在范围内 (128)"
开始('https://youdo.com/tasks-all-opened-all-moscow-1') 互联网上有一些关于更改某些文件中的编码的建议 - 不要真正理解它。我不能在控制台中阅读它吗? 操作系统
UPD key.encode("utf-8")
【问题讨论】:
-
编码,不解码。
-
endcode 也没有帮助,我忘了告诉leto12h.storage.yandex.net/rdisk/…