网页抓取 AccuWeather 网站答案

【问题标题】：Web Scraping AccuWeather site网页抓取 AccuWeather 网站
【发布时间】：2020-09-09 16:11:23
【问题描述】：

我最近开始学习在 python 中使用 Scrapy 进行 Web 抓取，并面临从 AccuWeather.org 网站 (https://www.accuweather.com/en/gb/london/ec4a-2/may-weather/328328?year=2020) 抓取数据的问题。基本上，我正在捕获日期及其天气温度以用于报告目的。在检查该站点时，我发现太多 div 标签，因此对编写代码感到困惑。因此，我想我会寻求专家的帮助。

这是我的代码供您参考。

import scrapy

class QuoteSpider(scrapy.Spider):
    name = 'quotes'
    start_urls = ['https://www.accuweather.com/en/gb/london/ec4a-2/may-weather/328328?year=2020']

    def parse(self, response):
        All_div_tags = response.css('div.content-module')[0]
        #Grid_tag = All_div_tags.css('div.monthly-grid')
        Date_tag = All_div_tags.css('div.date::text').extract()
        yield {
            'Date' : Date_tag}

我在 PyCharm 中编写了此代码，但由于“代码未处理或不允许”而出现错误。请问有人可以帮我吗？

【问题讨论】：

Scrapy 日志在这里是相关的。是否发生了一些重定向？

标签： python web-scraping scrapy pycharm

【解决方案1】：

我尝试阅读一些网站，但这些网站给了我同样的错误。发生这种情况是因为某些网站不允许对其进行网络抓取。要从这些网站获取数据，如果他们有 API，您可能需要使用他们的 API。 幸运的是，AccuWeather 让 API 的使用变得轻松（与其他 API 不同）：

您首先需要在他们的开发者网站上创建一个帐户：https://developer.accuweather.com/
现在，转到我的应用程序 > 添加新应用程序来创建一个新应用程序。
您可能会看到有关您的应用的一些信息（如果没有，请按其名称，它可能会出现）。您需要的唯一信息是您的 API 密钥，这对于 API 至关重要。
AccuWeather 有关于他们的 API here 的非常好的文档，但我将向您展示如何使用最有用的那些。你需要有你想要获取天气的城市的位置键，它显示在它的天气页面的 URL 中，例如伦敦的 URL 是www.accuweather.com/en/gb/london/ec4a-2/weather-forecast/328328，所以它的位置键是 328328。
当您拥有要从中获取天气的城市的位置键时，打开一个文件，然后输入：

import requests
import json

如果您想要每日天气（如图所示here），请输入：

response = requests.get(url="http://dataservice.accuweather.com/forecasts/v1/daily/1day/LOCATIONKEY?apikey=APIKEY")
print(response.status_code)

将 APIKEY 替换为您的 API 密钥，并将 LOCATIONKEY 替换为城市的位置密钥。当你运行它时它现在应该显示 200（意味着请求成功）现在，将其加载为 JSON 文件：

response_json = json.loads(response.content)

你现在可以从中得到一些信息，比如当天的“定义”：

print(response_json["Headline"]["Text"])

最低温度：

min_temperature = response_json["DailyForecasts"][0]["Temperature"]["Minimum"]["Value"]
print(f"Minimum Temperature: {min_temperature}")

最高温度

max_temperature = response_json["DailyForecasts"][0]["Temperature"]["Maximum"]["Value"]
print(f"Maximum Temperature: {max_temperature}")

与单位的最低温度和最高温度：

min_temperature = str(response_json["DailyForecasts"][0]["Temperature"]["Minimum"]["Value"]) + response_json["DailyForecasts"][0]["Temperature"]["Minimum"]["Unit"]
print(f"Minimum Temperature: {min_temperature}")

max_temperature = str(response_json["DailyForecasts"][0]["Temperature"]["Maximum"]["Value"]) + response_json["DailyForecasts"][0]["Temperature"]["Maximum"]["Unit"]
print(f"Maximum Temperature: {max_temperature}")

还有更多。

如果您有任何问题，请告诉我。希望能帮到你！

【讨论】：