【问题标题】:How to extract a element of JSON script with BeautifulSoup如何使用 BeautifulSoup 提取 JSON 脚本的元素
【发布时间】:2021-01-27 21:55:06
【问题描述】:

我想用脚本标签中的键:startDate 提取值。

这是我的代码:

# import library
import json
import requests
from bs4 import BeautifulSoup

# Request to website and dowload HTML contents
url = 'https://www.coteur.com/cotes-foot.php'

#page = requests.get(url)
response = requests.get(url)

#soup = BeautifulSoup(page.text, 'html.parser')
soup = BeautifulSoup(response.text, 'html.parser')

s = soup.find("table", id="mediaTable").find_all('script', type='application/ld+json')
print(s)

【问题讨论】:

    标签: json python-3.x web-scraping beautifulsoup


    【解决方案1】:

    试试这个:

    import json
    import re
    
    import requests
    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup(requests.get('https://www.coteur.com/cotes-foot.php').text, 'html.parser')
    s = soup.find("table", id="mediaTable").find_all('script', type='application/ld+json')
    print([json.loads(re.search(r'>(.+)<', str(j), re.S).group(1))["startDate"] for j in s])
    

    输出:

    ['2021-01-28T12:00', '2021-01-28T13:00', '2021-01-28T15:30', '2021-01-28T16:00', '2021-01-28T16:15', '2021-01-28T18:00', '2021-01-28T18:00', '2021-01-28T18:45', '2021-01-28T18:45', '2021-01-28T19:00', '2021-01-28T19:00', '2021-01-28T19:15', '2021-01-28T20:30', '2021-01-28T20:30', '2021-01-28T21:00', '2021-01-28T21:00', '2021-01-28T21:00', '2021-01-28T21:00', '2021-01-28T21:00', '2021-01-28T22:15', '2021-01-28T23:00', '2021-01-29T00:00', '2021-01-29T00:00', '2021-01-29T04:00', '2021-01-29T09:05']
    

    【讨论】:

    • 你能解释一下代码的最后一行吗?谢谢
    猜你喜欢
    • 1970-01-01
    • 2020-02-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-05-25
    • 2012-11-21
    • 2020-07-22
    相关资源
    最近更新 更多