【发布时间】:2018-12-14 13:06:32
【问题描述】:
(Databricks 上的 Apache Spark 版本 2.3.1)
您好,我有一个如下所示的 JSON 转储
[{"standings": {"visitorteam_position": 1, "localteam_position": 1}, "season_id": 892, "pitch": null, "commentaries": null, "id": 10342083, "venue_id": 273277, "formations": {"localteam_formation": null, "visitorteam_formation": null}, "aggregate_id": null, "round_id": null, "visitorteam_id": 18647, "winning_odds_calculated": false, "deleted": false, "coaches": {"localteam_coach_id": 472158, "visitorteam_coach_id": 474616}, "attendance": null, "scores": {"ft_score": null, "visitorteam_score": 0, "et_score": null, "localteam_pen_score": null, "visitorteam_pen_score": null, "localteam_score": 0, "ht_score": null}, "referee_id": 18783, "stage_id": 1728, "weather_report": null, "league_id": 732, "localteam_id": 15251, "time": {"status": "NS", "starting_at": {"date": "2018-07-06", "date_time": "2018-07-06 14:00:00", "timezone": "UTC", "timestamp": 1530885600, "time": "14:00:00"}, "extra_minute": null, "injury_time": null, "second": null, "added_time": null, "minute": null}, "group_id": null}, {"standings": {"visitorteam_position": 1, "localteam_position": 1}, "season_id": 892, "pitch": null, "commentaries": null, "id": 10344350, "venue_id": 8869, "formations": {"localteam_formation": null, "visitorteam_formation": null}, "aggregate_id": null, "round_id": null, "visitorteam_id": 18743, "winning_odds_calculated": false, "deleted": false, "coaches": {"localteam_coach_id": 474720, "visitorteam_coach_id": 474796}, "attendance": null, "scores": {"ft_score": null, "visitorteam_score": 0, "et_score": null, "localteam_pen_score": null, "visitorteam_pen_score": null, "localteam_score": 0, "ht_score": null}, "referee_id": 16781, "stage_id": 1728, "weather_report": null, "league_id": 732, "localteam_id": 18704, "time": {"status": "NS", "starting_at": {"date": "2018-07-06", "date_time": "2018-07-06 18:00:00", "timezone": "UTC", "timestamp": 1530900000, "time": "18:00:00"}, "extra_minute": null, "injury_time": null, "second": null, "added_time": null, "minute": null}, "group_id": null}]
我正在尝试将其直接从变量转换为数据帧,而不是上传 JSON 文件;主要是因为我从 GET 请求到 API 获取 JSON 数据。
这是我的转换代码 -
countries = spark.read.option("multiline", "true").json(json.dumps(ts)).show(false)
给我这个错误,请指出正确的方向。我检查了一下,但我只看到了 Scala 的解决方案。寻找相同的 Python 修复程序。
IllegalArgumentException: u'java.net.URISyntaxException: 相对路径 在绝对 URI 中: "[{\"排名\":%20%7B%5C%22visitorteam_position%5C%22:%201,%20%5C%22localteam_position%5C%22:%201%7D,%20%5C%22season_id%5C% 22:%20892,%20%5C%22pitch%5C%22:%20null,%20%5C%22commentaries%5C%22:%20null,%20%5C%22id%5C%22:%2010342083,%20% 5C%22venue_id%5C%22:%20273277,%20%5C%22formations%5C%22:%20%7B%5C%22localteam_formation%5C%22:%20null,%20%5C%22visitorteam_formation%5C%22:% 20null%7D,%20%5C%22aggregate_id%5C%22:%20null,%20%5C%22round_id%5C%22:%20null,%20%5C%22visitorteam_id%5C%22:%2018647,%20%5C %22winning_odds_calculated%5C%22:%20false,%20%5C%22deleted%5C%22:%20false,%20%5C%22coaches%5C%22:%20%7B%5C%22localteam_coach_id%5C%22:%20472158 ,%20%5C%22visitorteam_coach_id%5C%22:%20474616%7D,%20%5C%22attendance%5C%22:%20null,%20%5C%22scores%5C%22:%20%7B%5C%22ft_score %5C%22:%20null,%20%5C%22visitorteam_score%5C%22:%200,%20%5C%22et_score%5C%22:%20null,%20%5C%22localteam_pen_score%5C%22:%20null, %20%5C%22visitorteam_pen_score%5C%22:%20null,%20%5C%22localteam_score%5C%22:%200,%20%5C%22ht_score%5C%22:% 20null%7D,%20%5C%22referee_id%5C%22:%2018783,%20%5C%22stage_id%5C%22:%201728,%20%5C%22weather_report%5C%22:%20null,%20%5C %22league_id%5C%22:%20732,%20%5C%22localteam_id%5C%22:%2015251,%20%5C%22time%5C%22:%20%7B%5C%22status%5C%22:%20 %5C%22NS%5C%22,%20%5C%22starting_at%5C%22:%20%7B%5C%22date%5C%22:%20%5C%222018-07-06%5C%22,%20 %5C%22date_time%5C%22:%20%5C%222018-07-06%2014:00:00%5C%22,%20%5C%22timezone%5C%22:%20%5C%22UTC%5C% 22,%20%5C%22timestamp%5C%22:%201530885600,%20%5C%22time%5C%22:%20%5C%2214:00:00%5C%22%7D,%20%5C%22extra_minute %5C%22:%20null,%20%5C%22injury_time%5C%22:%20null,%20%5C%22second%5C%22:%20null,%20%5C%22added_time%5C%22:%20null, %20%5C%22minute%5C%22:%20null%7D,%20%5C%22group_id%5C%22:%20null%7D,%20%7B%5C%22standings%5C%22:%20%7B% 5C%22visitorteam_position%5C%22:%201,%20%5C%22localteam_position%5C%22:%201%7D,%20%5C%22season_id%5C%22:%20892,%20%5C%22pitch%5C% 22:%20null,%20%5C%22commentaries%5C%22:%20null,%20%5C%22id%5C%22:%2010344350,%20%5C%22venue_id%5C%22:%208869,%20% 5C%22f ormations%5C%22:%20%7B%5C%22localteam_formation%5C%22:%20null,%20%5C%22visitorteam_formation%5C%22:%20null%7D,%20%5C%22aggregate_id%5C%22:% 20null,%20%5C%22round_id%5C%22:%20null,%20%5C%22visitorteam_id%5C%22:%2018743,%20%5C%22winning_odds_calculated%5C%22:%20false,%20%5C%22deleted %5C%22:%20false,%20%5C%22coaches%5C%22:%20%7B%5C%22localteam_coach_id%5C%22:%20474720,%20%5C%22visitorteam_coach_id%5C%22:%20474796%7D ,%20%5C%22attendance%5C%22:%20null,%20%5C%22scores%5C%22:%20%7B%5C%22ft_score%5C%22:%20null,%20%5C%22visitorteam_score%5C %22:%200,%20%5C%22et_score%5C%22:%20null,%20%5C%22localteam_pen_score%5C%22:%20null,%20%5C%22visitorteam_pen_score%5C%22:%20null,%20 %5C%22localteam_score%5C%22:%200,%20%5C%22ht_score%5C%22:%20null%7D,%20%5C%22referee_id%5C%22:%2016781,%20%5C%22stage_id%5C %22:%201728,%20%5C%22weather_report%5C%22:%20null,%20%5C%22league_id%5C%22:%20732,%20%5C%22localteam_id%5C%22:%2018704,%20 %5C%22time%5C%22:%20%7B%5C%22status%5C%22:%20%5C%22NS%5C%22,%20%5C%22starting_at%5C%22: %20%7B%5C%22date%5C%22:%20%5C%222018-07-06%5C%22,%20%5C%22date_time%5C%22:%20%5C%222018-07-06% 2018:00:00%5C%22,%20%5C%22timezone%5C%22:%20%5C%22UTC%5C%22,%20%5C%22timestamp%5C%22:%201530900000,%20%5C %22time%5C%22:%20%5C%2218:00:00%5C%22%7D,%20%5C%22extra_minute%5C%22:%20null,%20%5C%22injury_time%5C%22:% 20null,%20%5C%22second%5C%22:%20null,%20%5C%22added_time%5C%22:%20null,%20%5C%22minute%5C%22:%20null%7D,%20%5C %22group_id%5C%22:%20null%7D%5D%22'
输出
打印(ts)
Out[45]:
[{u'aggregate_id': None,
u'attendance': None,
u'coaches': {u'localteam_coach_id': 472158, u'visitorteam_coach_id': 474616},
u'commentaries': None,
u'deleted': False,
u'formations': {u'localteam_formation': None,
u'visitorteam_formation': None},
u'group_id': None,
u'id': 10342083,
u'league_id': 732,
u'localteam_id': 15251,
u'pitch': None,
u'referee_id': 18783,
u'round_id': None,
u'scores': {u'et_score': None,
u'ft_score': None,
u'ht_score': None,
u'localteam_pen_score': None,
u'localteam_score': 0,
u'visitorteam_pen_score': None,
u'visitorteam_score': 0},
u'season_id': 892,
u'stage_id': 1728,
u'standings': {u'localteam_position': 1, u'visitorteam_position': 1},
u'time': {u'added_time': None,
u'extra_minute': None,
u'injury_time': None,
u'minute': None,
u'second': None,
u'starting_at': {u'date': u'2018-07-06',
u'date_time': u'2018-07-06 14:00:00',
u'time': u'14:00:00',
u'timestamp': 1530885600,
u'timezone': u'UTC'},
u'status': u'NS'},
u'venue_id': 273277,
u'visitorteam_id': 18647,
u'weather_report': None,
u'winning_odds_calculated': False},
{u'aggregate_id': None,
u'attendance': None,
u'coaches': {u'localteam_coach_id': 474720, u'visitorteam_coach_id': 474796},
u'commentaries': None,
u'deleted': False,
u'formations': {u'localteam_formation': None,
u'visitorteam_formation': None},
u'group_id': None,
u'id': 10344350,
u'league_id': 732,
u'localteam_id': 18704,
u'pitch': None,
u'referee_id': 16781,
u'round_id': None,
u'scores': {u'et_score': None,
u'ft_score': None,
u'ht_score': None,
u'localteam_pen_score': None,
u'localteam_score': 0,
u'visitorteam_pen_score': None,
u'visitorteam_score': 0},
u'season_id': 892,
u'stage_id': 1728,
u'standings': {u'localteam_position': 1, u'visitorteam_position': 1},
u'time': {u'added_time': None,
u'extra_minute': None,
u'injury_time': None,
u'minute': None,
u'second': None,
u'starting_at': {u'date': u'2018-07-06',
u'date_time': u'2018-07-06 18:00:00',
u'time': u'18:00:00',
u'timestamp': 1530900000,
u'timezone': u'UTC'},
u'status': u'NS'},
u'venue_id': 8869,
u'visitorteam_id': 18743,
u'weather_report': None,
u'winning_odds_calculated': False}]
打印(json.dumps(ts))
Out[44]: '[{"standings": {"visitorteam_position": 1, "localteam_position": 1}, "season_id": 892, "pitch": null, "commentaries": null, "id": 10342083, "venue_id": 273277, "formations": {"localteam_formation": null, "visitorteam_formation": null}, "aggregate_id": null, "round_id": null, "visitorteam_id": 18647, "winning_odds_calculated": false, "deleted": false, "coaches": {"localteam_coach_id": 472158, "visitorteam_coach_id": 474616}, "attendance": null, "scores": {"ft_score": null, "visitorteam_score": 0, "et_score": null, "localteam_pen_score": null, "visitorteam_pen_score": null, "localteam_score": 0, "ht_score": null}, "referee_id": 18783, "stage_id": 1728, "weather_report": null, "league_id": 732, "localteam_id": 15251, "time": {"status": "NS", "starting_at": {"date": "2018-07-06", "date_time": "2018-07-06 14:00:00", "timezone": "UTC", "timestamp": 1530885600, "time": "14:00:00"}, "extra_minute": null, "injury_time": null, "second": null, "added_time": null, "minute": null}, "group_id": null}, {"standings": {"visitorteam_position": 1, "localteam_position": 1}, "season_id": 892, "pitch": null, "commentaries": null, "id": 10344350, "venue_id": 8869, "formations": {"localteam_formation": null, "visitorteam_formation": null}, "aggregate_id": null, "round_id": null, "visitorteam_id": 18743, "winning_odds_calculated": false, "deleted": false, "coaches": {"localteam_coach_id": 474720, "visitorteam_coach_id": 474796}, "attendance": null, "scores": {"ft_score": null, "visitorteam_score": 0, "et_score": null, "localteam_pen_score": null, "visitorteam_pen_score": null, "localteam_score": 0, "ht_score": null}, "referee_id": 16781, "stage_id": 1728, "weather_report": null, "league_id": 732, "localteam_id": 18704, "time": {"status": "NS", "starting_at": {"date": "2018-07-06", "date_time": "2018-07-06 18:00:00", "timezone": "UTC", "timestamp": 1530900000, "time": "18:00:00"}, "extra_minute": null, "injury_time": null, "second": null, "added_time": null, "minute": null}, "group_id": null}]'
提前致谢!
附言。 - 这里是如何使用 Scala 的链接 - http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#tab_scala_5
【问题讨论】:
-
如果 ts 是您发布的格式,那么 (json.dumps(ts) 会有字符串 json 和 \n as
[{'aggregate_id': None,\n 'attendance': None,\n 'coaches':...不是这样吗?
标签: json apache-spark dataframe pyspark databricks