【问题标题】:Read XML file from URL in Python在 Python 中从 URL 读取 XML 文件
【发布时间】:2018-01-23 20:57:45
【问题描述】:

我正在使用一个名为 OpenTripPlanner 的开源项目,这是一个我计划用来模拟在给定时间从一个点到另一个点的大量行程的工具。到目前为止,我已经设法找到包含有关行程的所有信息的 XML 文件所在的 URL。 XML 是根据请求构建的,因此 URL 不是静态的。 URL 看起来像这样:

http://localhost:8080/otp/routers/default/plan?fromPlace=48.40915,%20-71.04996&toPlace=48.41428,%20-71.06996&date=2017/12/04&time=8:00:00&mode=TRANSIT,WALK

(您需要运行 OpenTripPlanner 服务器才能打开它)

现在,我想读取这些 XML 文件并使用 python 3 进行一些数据分析,但我找不到读取文件的方法。我尝试使用 urllib.request 在本地下载文件,但是我从中获得的文件的格式很奇怪。它看起来像这样

{"requestParameters":{"date":"2017/12/04","mode":"TRANSIT,WALK","fromPlace":"48.40915, -71.04996","toPlace":"48.41428, - 71.06996","time":"8:00:00"},"plan":{"date":1512392400000,"from":{"name":"Origin","lon":-71.04996,"lat" :48.40915,"orig":"","vertexType":"NORMAL"},"to":{"name":"Destination","lon":-71.06996,"lat":48.41428,"orig":" ","vertexType":"NORMAL"},"itineraries":[{"duration":1538,"startTime":1512392809000,"endTime":1512394347000,"walkTime":934,"transitTime":602,"waitingTime" :2,"walkDistance":1189.6595112715966,"walkLimitExceeded":false,"elevationLost":0.0,"elevationGained":0.0,"transfers":0,"legs":[{"startTime":1512392809000,"endTime":1512393537000 ,"departureDelay":0,"arrivalDelay":0,"realTime":false,"distance":926.553,"pathway":false,"mode":"WALK","route":"","agencyTimeZoneOffset": -18000000,"interlineWithPreviousLeg":false,"from":{"name":"Origin","lon":-71.04996,"lat":48.40915,"departure":1512392809000,"orig":"","vertexType ":"NORMAL"},"to":{"name":"Roitelets / Martinets","停止Id":"1:370","stopCode":"370","lon":-71.047688,"lat":48.401531,"arrival":1512393537000,"departure":1512393538000,"stopIndex":15,"stopSequence ":16,"vertexType":"TRANSIT"},"legGeometry":{"points":"s{mfHb{spL|ExBp@sDl@V@@lB|@j@FL?j@GbCk@|A] vEsA^KBA|C{@pCeACS~CuA`@Q","length":19},"rentedBike":false,"transitLeg":false,"duration":728.0,"steps":[{"distance": 131.991,"relativeDirection":"DEPART","streetName":"Rue D.-V.-Morrier","absoluteDirection":"SOUTH","stayOn":false,"area":false,"bogusName":false ,"lon":-71.04961760502248,"lat":48.4090671692228,"elevation":[]},{"distance":72.319,"relativeDirection":"LEFT","streetName":"Rue Lorenzo-Genest","absoluteDirection ":"EAST","stayOn":false,"area":false,"bogusName":false,"lon":-71.0502299,"lat":48.4079519,"elevation":[]}

当我尝试在浏览器中打开文件时,我收到一个错误提示

XML Parsing Error: not well-formed
Location: http://localhost:63342/XML_reader/file.xml?_ijt=e1d6h53s4mh1ak94sqortejf9v
Line Number 1, Column 1: ...

我用的脚本很简单,长这样

import urllib.request

testfile = urllib.request.URLopener()
file_name = 'http://localhost:8080/otp/routers/default/plan?fromPlace=48.40915,%20-71.04996&toPlace=48.41428,%20-71.06996&date=2017/12/04&time=8:00:00&mode=TRANSIT,WALK'
testfile.retrieve(file_name, "file.xml")

如何使输出的 XML 文件格式正确?除了 urllib.request 之外,还有其他我想尝试的方法吗?

非常感谢

【问题讨论】:

  • 这不是 xml 而是 json....
  • 响应不是 XML 格式的文档,而是 JSON 数据
  • 默认情况下 OTPjson 中回复。也许,如果您将请求的 Content-Type 修改为 xml (Application/xml? 不知道该怎么做,因为我不是网络专家)

标签: python xml urllib opentripplanner


【解决方案1】:

要将此文件作为 JSON 数据(不是 XML)导入,您需要 JSON 库

import urllib.request
import json
from pprint import pprint

testfile = urllib.request.URLopener()
file_name = 'http://localhost:8080/otp/routers/default/plan?fromPlace=48.40915,%20-71.04996&toPlace=48.41428,%20-71.06996&date=2017/12/04&time=8:00:00&mode=TRANSIT,WALK'
testfile.retrieve(file_name, "file.json")

data = json.load(open('file.json'))
pprint(data)

【讨论】:

  • 谢谢!这正是问题所在......我觉得有点愚蠢,因为不知道这是 JSON 哈哈......无论如何,感谢您花时间向我解释它!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-04-02
  • 2015-06-21
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多