【问题标题】:Get json inside a javascript var and inside jscript tag Python BS4在 javascript var 和 jscript 标签 Python BS4 中获取 json
【发布时间】:2021-08-23 12:42:57
【问题描述】:

我需要获取它在 java 脚本 var 中的 json,并且在这样的脚本标签中 var

<script type="text/javascript">

但它不是 HTML 中唯一的 type="text/javascript",它的标签是 netx:

    <script type="text/javascript">
        var products = '{"id":"000000000000193758","name":"FZ1292","estilo":"FZ1292","date":"44348","month":"JUNIO","day":"1","regster":"","price":"3499","name2":"McDonalds x Harden Vol. 5 ","description":"Detén tu hambre por jugar básquetbol ","image":"www.somesite.com":"ADIDAS","realdate":"06-01-2021"}  
</script>

我尝试了以下方法:

script = soup.find_all('script', {'type': 'text/javascript'})

但它带来了所有匹配的标签,我不知道如何识别具体的标签,因为它没有id

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:

    beautifulsoup无法解析javascript,但可以使用re/json模块解析数据。例如:

    import re
    import json
    
    html_doc = """
        <script type="text/javascript">
            var products = '{"id":"000000000000193758","name":"FZ1292","estilo":"FZ1292","date":"44348","month":"JUNIO","day":"1","regster":"","price":"3499","name2":"McDonalds x Harden Vol. 5 ","description":"Detén tu hambre por jugar básquetbol ","image":"www.somesite.com/ADIDAS","realdate":"06-01-2021"}'
    </script>
    """
    
    products = re.search(r"products = '(.*)'", html_doc).group(1)
    products = json.loads(products)
    
    # pretty print the data:
    print(json.dumps(products, indent=4))
    

    打印:

    {
        "id": "000000000000193758",
        "name": "FZ1292",
        "estilo": "FZ1292",
        "date": "44348",
        "month": "JUNIO",
        "day": "1",
        "regster": "",
        "price": "3499",
        "name2": "McDonalds x Harden Vol. 5 ",
        "description": "Det\u00e9n tu hambre por jugar b\u00e1squetbol ",
        "image": "www.somesite.com/ADIDAS",
        "realdate": "06-01-2021"
    }
    

    【讨论】:

      猜你喜欢
      • 2022-07-20
      • 1970-01-01
      • 2014-10-04
      • 2019-10-29
      • 1970-01-01
      • 1970-01-01
      • 2017-02-27
      • 2021-04-21
      • 1970-01-01
      相关资源
      最近更新 更多