【问题标题】:flattening json with mulitple nested lists使用多个嵌套列表展平 json
【发布时间】:2021-10-13 02:23:39
【问题描述】:

我有一个带有嵌套列表 @"ManyActionDateTimes" 和 @"Comments" 的 json,如下所示:

jframe = [{"LoadRef": 0,
            "BookedDate": "2021-10-13T01:15:54.287Z",
            "EndDateTime": "2021-10-13T01:15:54.287Z",
            "Duration": "string",
            "Mp": true,
            "ManyActionDateTimes": [
            {
            "EventTime": "2021-10-13T01:15:54.287Z",
            "Type": "string",
            "Vehicle": "string"
            }
            ],
            "Finished": true,
            "Comments": [
            {
            "Comment": "string",
            "DateInserted": "2021-10-13T01:15:54.287Z"
            }
            ],
            "Company": "string",
            "SiteCode": 0,
            "Source": "string"
            }]

我正在尝试将其展平为一个数据框,我知道对于一个嵌套列表我可以使用“record_path”参数,但它不适用于当我使用代码时的倍数:

cols = []
[d] = jframe
drop = ['Comments','ManyActionDateTimes']
for k, v in d.items():
    if k in drop:
        pass
    else:
        cols.append(k)

jdf = pd.json_normalize(jframe,
record_path=['Comments','ManyActionDateTimes'],meta=cols, errors='ignore')

我收到错误“KeyError: 'ManyActionDateTimes':

KeyError: 'ManyActionDateTimes'
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-33-55851049e36a> in <module>
      4     cols.append(k)
      5 
----> 6 jdf = pd.json_normalize(jframe,
      7 record_path=['Comments','ManyActionDateTimes'],meta=cols, errors='ignore')
      8 #meta=[:], errors='ignore')

C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
    502                 records.extend(recs)
    503 
--> 504     _recursive_extract(data, record_path, {}, level=0)
    505 
    506     result = DataFrame(records)

C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _recursive_extract(data, path, seen_meta, level)
    472                         seen_meta[key] = _pull_field(obj, val[-1])
    473 
--> 474                 _recursive_extract(obj[path[0]], path[1:], seen_meta, level=level + 1)
    475         else:
    476             for obj in data:

C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _recursive_extract(data, path, seen_meta, level)
    475         else:
    476             for obj in data:
--> 477                 recs = _pull_records(obj, path[0])
    478                 recs = [
    479                     nested_to_record(r, sep=sep, max_level=max_level)

C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _pull_records(js, spec)
    397         if has non iterable value.
    398         """
--> 399         result = _pull_field(js, spec)
    400 
    401         # GH 31507 GH 30145, GH 26284 if result is not list, raise TypeError if not

C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\json\_normalize.py in _pull_field(js, spec)
    388                 result = result[field]
    389         else:
--> 390             result = result[spec]
    391         return result
    392 

KeyError: 'ManyActionDateTimes'

我的其他解决方案尝试都没有奏效,而且我没有想法,

非常感谢

【问题讨论】:

标签: python json pandas


【解决方案1】:

您可以改用这个:

df = pd.json_normalize(jframe)
df = df.join(df['ManyActionDateTimes'].explode().apply(pd.Series)).join(df['Comments'].explode().apply(pd.Series))
df.drop(columns=['ManyActionDateTimes', 'Comments'], inplace=True)

输出:

LoadRef                BookedDate               EndDateTime Duration    Mp  Finished Company  SiteCode  Source                 EventTime      Type  Vehicle Comment              DateInserted
0        0  2021-10-13T01:15:54.287Z  2021-10-13T01:15:54.287Z   string  True      True  string         0  string  2021-10-13T01:15:54.287Z  string  string  string  2021-10-13T01:15:54.287Z

【讨论】:

    【解决方案2】:

    flatten_json 适用于嵌套的 json 结构

    from flatten_json import flatten
    dic_flattened = (flatten(d, '.') for d in jframe)
    df = pd.DataFrame(dic_flattened)
    

    输出

       LoadRef                BookedDate               EndDateTime Duration    Mp ManyActionDateTimes.0.EventTime  ... Finished Comments.0.Comment   Comments.0.DateInserted Company SiteCode  Source
    0        0  2021-10-13T01:15:54.287Z  2021-10-13T01:15:54.287Z   string  True        2021-10-13T01:15:54.287Z  ...     True             string  2021-10-13T01:15:54.287Z  string        0  string
    
    [1 rows x 14 columns]
    

    【讨论】:

      猜你喜欢
      • 2021-11-05
      • 1970-01-01
      • 2020-12-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-12-23
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多