无法理解和解析 JSON URL 响应答案

【问题标题】：Unable to understand and parse the JSON URL response无法理解和解析 JSON URL 响应
【发布时间】：2017-07-20 06:14:05
【问题描述】：

我有一个 json url，我正在尝试从响应中提取数据。下面是我的代码

url = urllib2.urlopen("https://i1.adis.ws/s/foo/M0011126_001_SET.js?func=app.mjiProduct.handleJSON&protocol=https")
content = url.read()
soup = BeautifulSoup(content, "html.parser")
print(soup.prettify())
print(soup.items)
newDictionary=json.loads(str(soup))

下面是response.content

app.mjiProduct.handleJSON({"name":"M0011126_001_SET","items":[{"type":"img","src":"https://i1.adis.ws/i/foo /M0011126_001_MAIN","width":3200,"height":4800,"format":"TIFF","opaque":"true"},{"type":"img","src":"https:// /i1.adis.ws/i/foo/M0011126_001_ALT1","width":3200,"height":4800,"format":"TIFF","opaque":"true"},{"type":"img ","src":"https://i1.adis.ws/i/foo/M0011126_001_ALT2","width":3200,"height":4800,"format":"TIFF","opaque":"true "}]});

我是 JSON 新手，无法理解响应。另外，我需要将响应解析为json或某种形式来提取图像源。但是上面的代码给了我下面的错误。

无法解码任何 JSON 对象

有人可以指导我吗？谢谢

【问题讨论】：

标签： json python-3.x web-scraping beautifulsoup

【解决方案1】：

首先你的网址不起作用它返回app.mjiProduct.handleJSON({"status":"error","errorMsg":"Failed to get set"});

第二件事是您不必将内容传递给 Beautifulsoup，您可以将其直接传递给 json，就像我在下面的代码中所做的那样，没有 Beautifulsoup 对象。

我使用httpbin 进行测试，但这应该可以在您的网址中使用。我用的是python3 tho

from urllib.request import urlopen
import json
url = urlopen("http://httpbin.org/get")
content = url.read()
newDictionary=json.loads(content)
print(newDictionary)

输出：{'args': {}, 'headers': {'Accept-Encoding': 'identity', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'Python-urllib/3.6'}, 'origin': '', 'url': 'http://httpbin.org/get'}

【讨论】：

【解决方案2】：

下面是对我有用的代码。

json_data=url.read()
purify_data = json_data.split('handleJSON(')[1].split(');')[0]
loaded_json = json.dumps(json_data)
print(loaded_json['items'][0]['src'])

实际上，我发现 json_data 是字符串类型，由于该字符串的格式，我无法解码，即

app.mjiProduct.handleJSON（必需的 JSON）

所以，我首先过滤了我的字符串，然后用 json 加载它，问题就解决了。

【讨论】：

【解决方案3】：

响应不包含有效的 JSON。它看起来像一个可执行代码（可能是 JavaScript）。但是{"name":"M0011126_001_SET","items":[...]} 部分是有效的 JSON。因此，如果您确定响应始终具有这种格式，您可以像这样剥离函数调用：

content = url.read()[26:-2] # Cut first 26 characters and last two
newDictionary=json.loads(str(content))

我不太了解 Beautiful Soup，但我发现它是一个用于处理 HTML 文件的库，而您的响应不是 HTML，所以我认为您不应该使用它。

【讨论】：