【发布时间】:2021-01-11 21:41:44
【问题描述】:
我的样本数据如下:
sample_json = """{
"P1":[
{"Question":"Fruit",
"Choices":["Yes","No"]}
],
"P2":[
{"Question":"Fruit Name",
"Choices":["Mango","Apple","Banana"]}
],
"P3":[
{"Question":"Fruit color",
"Choices":["Yellow","Red"]}
],
"P4":[
{"Question":"Vegetable",
"Choices":["Yes","No"]}
],
"P5":[
{"Question":"Veggie Name",
"Choices":["Tomato","Potato","Carrots"]}
],
"P6":[
{"Question":"Veggie Color",
"Choices":["Red","Yellow","Brown"]}
],
"P7":[
{"Question":"Enjoy Eating?",
"Choices":["Yes","No"]}
]
}"""
我正在尝试使用 pandas 生成数据框,如下所示:
import json, random
import pandas as pd
sample_data = json.loads(sample_json)
colHeaders = []
for k,v in sample_data.items():
colHeaders.append(v[0]['Question'])
df = pd.DataFrame(columns= colHeaders)
for i in range (10):
Answers = []
for k,v in sample_data.items():
Answers.append(random.choice(v[0]['Choices']))
df.loc[len(df)] = Answers
它创建如下的df
虽然是随机的,但我想在以下条件下基于 P1 和 P4 填充它:
(附注:P1、P2、P3..P7 in sample_json)
- 如果
P1.AnswerChoice = No,则将Null填入P2和P3 - 如果
P4.AnswerChoice = No,则将Null填入P5和P6 - 如果
P1.AnswerChoice = No和P4.AnswerChoice = No,则将Null填入p7 -
P1.AnswerChoice和P4.AnswerChoice不能是Yes
这样就可以产生如下数据框:
| Fruit | Fruit Name | Fruit Color | Vegetable | Veggie Name | Veggie Color | Enjoy eating? |
|---|---|---|---|---|---|---|
| No | Null | Null | Yes | Carrots | Yellow | No |
| No | Null | Null | No | Null | Null | Null |
| Yes | Apple | Yellow | No | Null | Null | Yes |
| Yes | Banana | Yellow | No | Null | Null | No |
| No | Null | Null | Yes | Potato | Yellow | No |
| No | Null | Null | Yes | Tomato | Yellow | Yes |
| Yes | Mango | Red | No | Null | Null | Null |
| No | Null | Null | Yes | Carrots | Yellow | No |
| Yes | Apple | Yellow | No | Null | Null | No |
编辑:
我想用遍历 json 的 for 循环来处理这个问题,以便为数据框准备行,而不是编辑数据框。
如果可能的话,例如在代码的以下部分:
for i in range (10):
Answers = []
for k,v in sample_data.items():
Answers.append(random.choice(v[0]['Choices']))
df.loc[len(df)] = Answers
【问题讨论】:
-
这些 Pi 指的是列吗?
-
我指的是来自
sample_json.的P1、P2、...P7 -
你是要修改数据框还是json?
-
@It_is_Chris 我现在通过编辑进行了澄清。既不是数据框也不是 json,但最好是遍历 json 以创建添加到 df 的行的逻辑。
-
您创建数据框的方式效率低下...看看:stackoverflow.com/a/62734983/8893827
标签: python json pandas dataframe