【发布时间】:2019-04-18 15:21:34
【问题描述】:
我需要使用一堆字符串、整数值和 JSON 对象遍历数据框。
通过提供的代码,我想遍历此类数据框,从 JSON 对象中收集所需的值并将它们作为列值写入新的数据框。
但是,下面的代码仅返回所需数据帧的第一行,而下一个仅包含第一行中的 test_id 和 NaN。我做错了什么?
抱歉发帖不好。
def create_clean_data(df):
columns = ['test_id','winner_id', 'original_id', 'block_id', 'w_views','w_clicks', 'w_recirculation', 'w_time', 'o_views', 'o_clicks', 'o_recirculation', 'o_time']
data = pd.DataFrame(columns = columns)
for row in df.iterrows():
parsedData = row[1]
try:
winner = json.loads(parsedData.winner)
except ValueError:
winner = []
try:
params_on_finish = json.loads(parsedData.params_on_finish)
except ValueError:
params_on_finish = []
test_id = parsedData.id
if 'block_id' not in winner:
continue
block_id = winner['block_id']
winner_id = winner['headline_id']
test_id = parsedData.id
original_id = parsedData.variants[2:15]
w_views = 0
for param in params_on_finish:
if param['headline_id'] == winner['headline_id']:
w_views = param['views']
w_clicks = param['clicks']
w_recirculation = param ['recirculation']
w_time = param ['time']
if param['headline_id'] == parsedData.variants[2:15]:
o_views = param['views']
o_clicks = param['clicks']
o_recirculation = param ['recirculation']
o_time = param ['time']
data2 = pd.DataFrame([[test_id, winner_id, original_id, block_id, w_views, w_clicks, w_recirculation, w_time, o_views, o_clicks, o_recirculation, o_time]], columns = columns)
d22 = data2.append({'test_id': test_id}, ignore_index=True)
return d22
【问题讨论】:
-
你需要在
for循环之外声明和初始化d22。 -
你的意思是
data = pd.DataFrame(columns = columns)真的是d22 = pd.DataFrame(columns = columns)? -
@CilantroDitrek 以 12 秒的优势击败了我! :-)
标签: python json python-3.x pandas