【问题标题】:how to convert <class 'bs4.element.ResultSet'> to JSON in python using builtin operator json.dumps如何使用内置运算符 json.dumps 在 python 中将 <class 'bs4.element.ResultSet'> 转换为 JSON
【发布时间】:2019-02-22 19:47:42
【问题描述】:

如何转换成json格式,

我收到一个错误“不是 JSON 可序列化的”

以下是我的程序

from urllib2 import urlopen as uReq
import re
from bs4 import BeautifulSoup, Comment
import requests
import json
my_url='https://uae.dubizzle.com/en/property-for-rent/residential/apartmentflat/?filters=(neighborhoods.ids=123)&amp;page=1'

uClient=uReq(my_url)
page_html= uClient.read()
page_soup=BeautifulSoup(page_html, 'html.parser')
comments = page_soup.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
json_output= page_soup.find_all("script",type="application/ld+json",string=re.compile("SingleFamilyResidence")) #find_all("script", "application/ld+json")
#comments = json_output.findAll(text=lambda text:isinstance(text, Comment))
#[comment.extract() for comment in comments]
#json_output.find_all(text="<script type=""application/ld+json"">").replaceWith("")
#print json_output
jsonD = json.dumps(json_output)
uClient.close()

[{"@context":"http://schema.org","@type":"SingleFamilyResidence","name":"大马士革街Al Qusais宽敞2BHK出租","url":"https://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2018/4/29/spacious-two-bed-room-available-for-rent-i-2/" ,"address":{"@type":"PostalAddress","addressLocality":"Dubai","addressRegion":"Dubai"},"":{"@type":"Product","name":"大马士革街 Al Qusais 出租宽敞 2BHK","url":"https://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2018/4/29/spacious-two-bed-room-available-for-rent-i-2/","offers":{"@type":"Offer","price":49000,"priceCurrency":"AED"}} ,"floorSize":1400,"numberOfRooms":2,"image":"https://dbzlpvfeeds-a.akamaihd.net/images/user_images/2018/04/29/80881784_CP_photo.jpeg","geo":{"@type":"GeoCoordinates","latitude":55.3923,"longitude":25.2893}} , {"@context":"http://schema.org","@type":"SingleFamilyResidence","name":"设备齐全的 2 居室公寓 -Al Qusais","url":"https://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2017/10/9/fully-furnished-brand-new-2-bed-room-flat--2/","address ":{"@type":"PostalAddress","addressLocality":"Dubai","addressRegion":"Dubai"},"":{"@type":"Product","name":"全套家具 2 Bed Room Flat -Al Qusais","url":"https://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2017/10/9/fully-furnished-brand-new-2-bed-room-flat--2/","offers":{"@type":"Offer","price":70000,"priceCurrency":"AED"}},"floorSize ":1400,"numberOfRooms":2,"image":"https://dbzlpvfeeds-a.akamaihd.net/images/user_images/2018/09/05/84371522_CP_photo.jpeg","geo":{"@type":"GeoCoordinates","latitud e":55.3959,"经度":25.2959}}]

【问题讨论】:

  • 你必须在json.dumps()之前替换script标签

标签: json python-2.7 web-scraping beautifulsoup


【解决方案1】:

先将 bs4.element.ResultSet 转换为字符串,然后更改为 json

json_data = json.dumps(str(json_output))

【讨论】:

    【解决方案2】:

    您好,添加了 BeautifulSoup 的另一个包装器,并通过

    获得了预期的 json

    首先获取文本并使用 .get_text() 方法,然后使用 json.loads

    感谢知识分子。

    from urllib2 import urlopen as uReq
    import re
    from bs4 import BeautifulSoup, Comment
    import requests
    import json
    my_url='https://uae.dubizzle.com/en/property-for-rent/residential/apartmentflat/?filters=(neighborhoods.ids=123)&amp;page=1'
    
    uClient=uReq(my_url)
    page_html= uClient.read()
    page_soup=BeautifulSoup(page_html, 'lxml')# 'html.parser')
    json_output= BeautifulSoup(str(page_soup.find_all("script",type="application/ld+json",string=re.compile("SingleFamilyResidence"))), 'lxml')#find_all("script", "application/ld+json")
    json_text=json_output.get_text()
    json_data = json.loads(json_text)
    print json_data
    uClient.close()
    

    【讨论】:

      猜你喜欢
      • 2014-01-24
      • 1970-01-01
      • 1970-01-01
      • 2014-09-21
      • 2019-02-05
      • 1970-01-01
      • 1970-01-01
      • 2019-06-05
      • 2011-07-04
      相关资源
      最近更新 更多