【发布时间】:2021-08-21 02:44:30
【问题描述】:
我正在尝试读取 API 返回的 Json 并将其播放到 DATAFRAME pyspar,但该文件仅出现在一个名为 _corrupt_record 的字段中:
我的 JSON 是这样的:
{
"title": "texto",
"nickname": "(app)",
"language": "en",
"folder_id": "0",
"category": "",
"question_count": 3,
"page_count": 3,
"response_count": 2636,
"date_created": "2021-01-27T19:22:00",
"date_modified": "2021-06-01T11:43:00",
"id": "00000",
"buttons_text": {
"next_button": "next",
"prev_button": "prev",
"done_button": "done",
"exit_button": ""
},
"is_owner": true,
"footer": false,
"custom_variables": {
},
"href": "https://api",
"analyze_url": "https://api",
"edit_url": "https://api",
"collect_url": "https://api",
"summary_url": "https://api",
"preview": "https://api",
"pages": [
{
"title": "",
"description": "",
"position": 1,
"question_count": 1,
"id": "00000",
"href": "https://api",
"questions": [
{
"id": "602406071",
"position": 1,
"visible": true,
"family": "matrix",
"subtype": "rating",
"layout": null,
"sorting": null,
"required": {
"text": "texto",
"type": "all",
"amount": "0"
},
"validation": null,
"forced_ranking": false,
"headings": [
{
"heading": "texto"
}
],
"href": "https://api",
"answers": {
"rows": [
{
"position": 1,
"visible": true,
"text": "",
"id": "00000"
}
],
"choices": [
{
"position": 1,
"visible": true,
"text": "texto",
"id": "00000",
"is_na": false,
"weight": 0,
"description": ""
},
{
"position": 2,
"visible": true,
"text": "texto",
"id": "00000",
"is_na": false,
"weight": 0,
"description": ""
},
{
"position": 3,
"visible": true,
"text": "texto",
"id": "00000",
"is_na": false,
"weight": 0,
"description": ""
},
{
"position": 4,
"visible": true,
"text": "texto",
"id": "00000",
"is_na": false,
"weight": 0,
"description": ""
},
{
"position": 5,
"visible": true,
"text": "texto",
"id": "00000",
"is_na": false,
"weight": 0,
"description": ""
}
]
},
"display_options": {
"show_display_number": true,
"display_type": "emoji",
"display_subtype": "star",
"left_label_id": null,
"left_label": "",
"right_label_id": null,
"right_label": "",
"middle_label_id": null,
"middle_label": "",
"custom_options": {
"color": "#f5a623",
"option_set": [
]
}
}
}
]
}
]
}
我需要按列分隔,包括对齐的字段,我要按列分隔:
我正在尝试:
endpoint = f"my_API"
headers = {'Authorization': subscription_key}
request = requests.get(endpoint, headers=headers)
aall_df = json.loads(request.content)
rdd = sc.parallelize([aall_df])
df1 = spark.read.json(rdd)
df1.show(truncate=False)
预期输出:
| title | nickname | language | folder_id | category | question_count | page_count | date_created | date_modified | id | next_button | prev_button | ... |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| texto | (app) | en | 0 | nan | 3 | 3 | 2021-01-27... | 2021-01-27... | 0 | next | prev | ... |
有人可以帮帮我吗?
【问题讨论】:
-
这将有助于您提供预期的输出
-
预期输出:标题 |昵称 |语言|文件夹ID |类别 |问题计数 |页数 |日期创建 |日期修改 |编号 |下一个按钮| prev_button|... texto |(app) | zh | 0 | | 3 | 3 |2021-01-27... |2021-01-27... |0000|下一个 |上一个 |...
标签: json apache-spark pyspark