【问题标题】:(Python) Beautifull soup and encoding (utf-8, cp1252,ascii...)(Python) Beautifull soup 和编码 (utf-8, cp1252,ascii...)
【发布时间】:2020-05-29 20:36:58
【问题描述】:

请帮忙,我现在很紧张。自从我开始学习 Python 以来,我就遇到了这个问题。总是遇到同样的问题,网上没有人能给出任何有效的答案

我的代码:

from bs4 import BeautifulSoup
import requests

page = requests.get(
    'https://forecast.weather.gov/MapClick.php?lat=34.05349000000007&lon=-118.24531999999999#.XswiwMCxWUk')
soup = BeautifulSoup(page.content, 'html.parser')
week = soup.find(id='seven-day-forecast-body')
items = week.find_all(class_='forecast-tombstone')

print(items[0].find(class_='period-name').get_text())
print(items[0].find(class_='short-desc').get_text())
print(items[0].find(class_='temp temp-high').get_text())

period_names = [item.find(class_='period-name').get_text() for item in items]
short_descp = [item.find(class_='short-desc').get_text() for item in items]
temp = [item.find(class_='temp temp-high').get_text() for item in items]
print(period_names)
print(short_descp)
print(temp)

输出:

[Running] python -u "c:\Users\dukasu\Documents\Python\test.py"
ThisAfternoon
Partly Sunny
High: 76 �F
Traceback (most recent call last):
  File "c:\Users\dukasu\Documents\Python\test.py", line 20, in <module>
    temp = [item.find(class_='temp temp-high').get_text() for item in items]
  File "c:\Users\dukasu\Documents\Python\test.py", line 20, in <listcomp>
    temp = [item.find(class_='temp temp-high').get_text() for item in items]
AttributeError: 'NoneType' object has no attribute 'get_text'

[Done] exited with code=1 in 0.69 seconds

问题是由于 utf-8 编码(我的电脑是 cp1252),但如何最终解决(我认为问题是因为它不能使用度数符号操作)。 Python 2 中有一个简单的代码,但是如何在 Python 3.xx 中解决它。如何在代码开头设置编码并忘记这个问题。 请原谅我的英语,它不是我的母语。

【问题讨论】:

  • 试试这个post
  • 你为什么认为这个错误与 utf-8 有关?我在您发布的错误消息中的任何地方都看不到。
  • 错误来自类名class_='temp temp-high'@MarkRansom
  • @0m3r 问题是item.find 正在返回None,这又与utf-8 无关。

标签: python python-3.x beautifulsoup utf-8 cp1252


【解决方案1】:

错误来自返回 None 的类名,仅使用 class_='temp 而不是 class_='temp temp-high

例子

temp = [item.find(class_='temp').get_text() for item in items]

完整代码

from bs4 import BeautifulSoup
import requests

page = requests.get(
    'https://forecast.weather.gov/MapClick.php?lat=34.05349000000007&lon=-118.24531999999999#.XswiwMCxWUk')
soup = BeautifulSoup(page.content, 'html.parser')
week = soup.find(id='seven-day-forecast-body')
items = week.find_all(class_='forecast-tombstone')

print(items[0].find(class_='period-name').get_text())
print(items[0].find(class_='short-desc').get_text())
print(items[0].find(class_='temp temp-high').get_text())

period_names = [item.find(class_='period-name').get_text() for item in items]
short_descp = [item.find(class_='short-desc').get_text() for item in items]
temp = [item.find(class_='temp').get_text() for item in items]
print(period_names)
print(short_descp)
print(temp)

打印出来

ThisAfternoon
Partly Sunny
High: 76 °F
['ThisAfternoon', 'Tonight', 'Saturday', 'SaturdayNight', 'Sunday', 'SundayNight', 'Monday', 'MondayNight', 'Tuesday']
['Partly Sunny', 'Patchy Fog', 'Patchy Fogthen MostlySunny', 'Patchy Fog', 'Patchy Fogthen PartlySunny', 'Patchy Fog', 'Patchy Fogthen MostlyCloudy', 'Mostly Cloudy', 'Partly Sunny']
['High: 76 °F', 'Low: 58 °F', 'High: 75 °F', 'Low: 59 °F', 'High: 80 °F', 'Low: 61 °F', 'High: 78 °F', 'Low: 61 °F', 'High: 77 °F']

【讨论】:

    【解决方案2】:

    结果证明这是一个简单的问题。

    好的,但这里是打印输出:

    [Running] python -u "c:\Users\dukasu\Documents\Python\test.py"
    ThisAfternoon
    Partly Sunny
    High: 76 �F
    ['ThisAfternoon', 'Tonight', 'Saturday', 'SaturdayNight', 'Sunday', 'SundayNight', 'Monday', 'MondayNight', 'Tuesday']
    ['Partly Sunny', 'Patchy Fog', 'Patchy Fogthen MostlySunny', 'Patchy Fog', 'Patchy Fogthen PartlySunny', 'Patchy Fog', 'Patchy Fogthen MostlyCloudy', 'Mostly Cloudy', 'Partly Sunny']
    ['High: 76 �F', 'Low: 58 �F', 'High: 75 �F', 'Low: 59 �F', 'High: 80 �F', 'Low: 61 �F', 'High: 78 �F', 'Low: 61 �F', 'High: 77 �F']
    
    [Done] exited with code=0 in 0.619 seconds
    

    如何打印出度数符号°?

    后来我加了

    import sys
    sys.stdout.reconfigure(encoding='utf-8')
    

    并打印出来:

    High: 76 °F
    

    【讨论】:

      猜你喜欢
      • 2021-06-14
      • 2011-05-11
      • 1970-01-01
      • 2014-02-14
      • 1970-01-01
      • 2014-06-19
      • 1970-01-01
      • 2021-07-11
      • 1970-01-01
      相关资源
      最近更新 更多