【发布时间】:2019-11-11 18:21:09
【问题描述】:
我想从http://maps.latimes.com/neighborhoods/population/density/neighborhood/list/ 中抓取以下数据:
var hoodFeatures = {
type: "FeatureCollection",
features: [{
type: "Feature",
properties: {
name: "Koreatown",
slug: "koreatown",
url: "/neighborhoods/neighborhood/koreatown/",
has_statistics: true,
label: 'Rank: 1<br>Population per Sqmi: 42,611',
population: "115,070",
stratum: "high"
},
geometry: { "type": "MultiPolygon", "coordinates": [ [ [ [ -118.286908, 34.076510 ], [ -118.289208, 34.052511 ], [ -118.315909, 34.052611 ], [ -118.323009, 34.054810 ], [ -118.319309, 34.061910 ], [ -118.314093, 34.062362 ], [ -118.313709, 34.076310 ], [ -118.286908, 34.076510 ] ] ] ] }
},
从上面的html中,我要分别取:
name
population per sqmi
population
geometry
并按名称将其转换为数据框
到目前为止我已经尝试过
import requests
import json
from bs4 import BeautifulSoup
response_obj = requests.get('http://maps.latimes.com/neighborhoods/population/density/neighborhood/list/').text
soup = BeautifulSoup(response_obj,'lxml')
该对象具有脚本信息,但我不明白如何使用该线程中建议的 json 模块: Parsing variable data out of a javascript tag using python
json_text = '{%s}' % (soup.partition('{')[2].rpartition('}')[0],)
value = json.loads(json_text)
value
我收到此错误
TypeError Traceback (most recent call last)
<ipython-input-12-37c4c0188ed0> in <module>
1 #Splits the text on the first bracket and last bracket of the javascript into JSON format
----> 2 json_text = '{%s}' % (soup.partition('{')[2].rpartition('}')[0],)
3 value = json.loads(json_text)
4 value
5 #import pprint
TypeError: 'NoneType' object is not callable
有什么建议吗?谢谢
【问题讨论】:
-
soup不是字符串,它可能会将partition作为标签名称<partition>不存在而您得到None。您必须使用soup.text这是一个字符串。您还可以找到标签<script>仅适用于可能具有 javascript 代码的文本 -code = soup.find('script').text
标签: javascript python beautifulsoup