如何使用调度包使用 python 进行网络抓取答案

【问题标题】：How to use schedule package to webscrape using python如何使用调度包使用 python 进行网络抓取
【发布时间】：2021-02-09 22:30:23
【问题描述】：

我编写这个脚本是为了从“Clima tempo”网站上进行一些网页抓取，并使用 Beautiful soup 提取信息并将其保存到 pandas 以导出到 excel。

但是在代码末尾使用调度时，脚本不会自动运行。

是否有任何与使用网络抓取有关的事实？这就是为什么不起作用？

def extractInfo():
    #Make the request
    html = requests.get("https://www.climatempo.com.br/previsao-do-tempo/agora/cidade/321/riodejaneiro-rj").content
    now = BS(html, "lxml")
    
    html = requests.get("https://www.climatempo.com.br/previsao-do-tempo/cidade/321/riodejaneiro-rj/").content
    today = BS(html, "lxml")

.......



schedule.every().day.at("08:00").do(extractInfo)

while True:
    schedule.run_pending()
    time.sleep(1)

在这里你可以找到 github 链接来检查所有的脚本 https://github.com/Tayzerdo/Webscraping-from-climatempo/blob/main/WebScraping.py

【问题讨论】：

标签： python python-3.x web-scraping

【解决方案1】：

脚本本身无法自动运行。 schedule package 适用于需要定期执行操作的长时间运行的程序。

根据您的操作系统，您可能需要查看

systemd 计时器/cronjobs (Linux)
cronjobs (macOS)
任务计划程序 (Windows)

【讨论】：

完美@AKX，非常感谢！！！！我来看看 windows 的任务调度器。