【发布时间】:2020-10-02 15:18:17
【问题描述】:
当我运行代码并打印交易时,有些交易名称不正确,例如
respData = urllib.request.urlopen(
'https://www.mcdelivery.com.pk/pk/browse/menu.html')
resp = respData.read().decode('utf-8')
link = re.findall(r'<ul class="secondary-menu">(.*?)</ul>', str(resp))
# URLS
Urls = re.findall("href=[\"\'](.*?)[\"\']", str(link))
# remove amp from the urls
Url1 = [re.sub(r'amp;', '', item) for item in Urls]
# menu
deals = re.findall(r'<span>(.*?)</span>', str(link))
print(deals)
代码输出:
['Deals', "\\\\xe2\\\\x98\\\\x85What\\\\\\'s New\\\\xe2\\\\x98\\\\x85", '\\\\xc3\\\\x80la carte & Value Meals', 'Crispy Chicken', 'Share Box', 'Happy Meals', 'Desserts', 'McCaf\\\\xc3\\\\xa9', 'Beverages', 'Side Lines', 'Snack Time']
\\xe2\\x98\\x85What\\\'s New\\xe2\\x98\\x85 这应该是What's New 并且这个\\xc3\\x80la carte & Value Meals 应该是la carte & value meals。
【问题讨论】:
-
显示的输出没有意义,因为“re.findall”返回一个列表,该列表应该打印为列表表示形式(带括号、引号等)。
-
['Deals', "\\\\xe2\\\\x98\\\\x85What\\\\\\'s New\\\\xe2\\\\x98\\\\x85", '\\\\xc3\\\\x80la carte & Value Meals', 'Crispy Chicken', 'Share Box', 'Happy Meals', 'Desserts', 'McCaf\\\\xc3\\\\xa9', 'Beverages', 'Side Lines', 'Snack Time']输出是这个 我想删除反斜杠和它的所有编码