【问题标题】:Nested json file to different DFs in Python将 json 文件嵌套到 Python 中的不同 DF
【发布时间】:2021-03-20 21:58:28
【问题描述】:

我有以下 json 文件:

{'transactionDetail': {'transactionID': 'rrt-0a75e3331e9d4a100-b-se-17175-7612138-13_1571', 
'transactionTimestamp': '2020-11-22T07:22:14.346Z', 'inLanguage': 'en-US', 'productID': 'aasmcu', 
'productVersion': '1'}, 'inquiryDetail': {'productVersion': 'v1', 'productID': 'aasmcu', 'duns': 
'6979900'}, 'organization': {'duns': '006979900', 'dunsControlStatus': {'operatingStatus': 
{'description': 'Active', 'dnbCode': 9074}}, 'primaryName': 'American Express Company', 
'isStandalone': False, 'primaryAddress': {'language': {}, 'addressCountry': {'name': 'United States', 
'isoAlpha2Code': 'US'}, 'continentalRegion': {'name': 'North America'}, 'addressLocality': {'name': 
'New York'}, 'minorTownName': None, 'addressRegion': {'name': 'New York', 'abbreviatedName': 'NY'}, 
'addressCounty': {'name': 'New York'}, 'postalCode': '10285-0002', 'postalCodePosition': {}, 
'streetNumber': None, 'streetName': None, 'streetAddress': {'line1': '200 Vesey St FL 50', 'line2': 
None}, 'postOfficeBox': {}}, 'corporateLinkage': {'familytreeRolesPlayed': [{'description': 'Global 
Ultimate', 'dnbCode': 12775}, {'description': 'Domestic Ultimate', 'dnbCode': 12774}, {'description': 
'Parent/Headquarters', 'dnbCode': 9141}], 'hierarchyLevel': 1, 
'globalUltimateFamilyTreeMembersCount': 1686}, 'dnbAssessment': {'materialChange': {'riskSegment': 
{'description': 'No Change of High Probability Risk Profile', 'dnbCode': 30686}, 
'organizationSizeSegment': {'description': 'Business Profile Decay', 'dnbCode': 30671}, 
'borrowingSegment': {'description': 'Business Profile Stable', 'dnbCode': 30670}, 'spendSegment': 
{'description': 'Business Profile Stable', 'dnbCode': 30670}, 'opportunityFinalSegment': 
{'description': 'Stable Business', 'dnbCode': 30681}}, 'triplePlay': {'compositeRiskScore': 5, 
'riskSegment': {'description': 'Promote Acqusition Targets', 'dnbCode': 30668}}}}}
 

{'transactionDetail': {'transactionID': 'rrt-04b146343b2275455-a-se-17594-7595335-2_1570', 
'transactionTimestamp': '2020-11-22T07:22:15.115Z', 'inLanguage': 'en-US', 'productID': 'aasmcu', 
'productVersion': '1'}, 'inquiryDetail': {'productVersion': 'v1', 'productID': 'aasmcu', 'duns': 
'5070479'}, 'organization': {'duns': '005070479', 'dunsControlStatus': {'operatingStatus': 
{'description': 'Active', 'dnbCode': 9074}}, 'primaryName': 'Caterpillar Inc.', 'isStandalone': 
False, 'primaryAddress': {'language': {}, 'addressCountry': {'name': 'United States', 
'isoAlpha2Code': 'US'}, 'continentalRegion': {'name': 'North America'}, 'addressLocality': {'name': 
'Deerfield'}, 'minorTownName': None, 'addressRegion': {'name': 'Illinois', 'abbreviatedName': 'IL'}, 
'addressCounty': {'name': 'Lake'}, 'postalCode': '60015-5031', 'postalCodePosition': {}, 
'streetNumber': None, 'streetName': None, 'streetAddress': {'line1': '510 Lake Cook Rd Ste 100', 
'line2': None}, 'postOfficeBox': {}}, 'corporateLinkage': {'familytreeRolesPlayed': [{'description': 
'Global Ultimate', 'dnbCode': 12775}, {'description': 'Domestic Ultimate', 'dnbCode': 12774}, 
{'description': 'Parent/Headquarters', 'dnbCode': 9141}], 'hierarchyLevel': 1, 
'globalUltimateFamilyTreeMembersCount': 1095}, 'dnbAssessment': {'materialChange': {'riskSegment': 
{'description': 'High Probability of Improvement in Risk Profile', 'dnbCode': 30682}, 
'organizationSizeSegment': {'description': 'Business Profile Decay', 'dnbCode': 30671}, 
'borrowingSegment': {'description': 'Business Profile Decay', 'dnbCode': 30671}, 'spendSegment': 
{'description': 'Business Profile Decay', 'dnbCode': 30671}, 'opportunityFinalSegment': 
{'description': 'Decrease In Scale', 'dnbCode': 30680}}, 'triplePlay': {'compositeRiskScore': 6, 
'riskSegment': {'description': 'Promote Acqusition Targets', 'dnbCode': 30668}}}}}

我需要做的是规范化 json 文件。在上面的示例中,我们有 2 家公司,但文件有 1000 个。如果我只有这样一家公司,我可以展平 json 文件:

with open('Material_Change_20201122.json') as f:
d = json.load(f)
first = d[0]
transaction_detail = json_normalize(first['transactionDetail'])
transaction_detail.rename(columns={'transactionID': 'record_id'}, inplace=True)

但是当添加超过 1 家公司时我遇到的问题是我需要创建一个 for loop 来遍历 json 并将每个公司附加到 DF 的新行。我的逻辑如下:

small_d= d[0:5]

transaction_detail_1 = pd.DataFrame()

for i in small_d:
    temp_df = json_normalize(i['transactionDetail'])
    temp_df.rename(columns={'transactionID': 'record_id'}, inplace=True)

    transaction_detail_1['record_id'].append(temp_df['record_id'])

但是当我运行它时,我得到一个错误KeyError: 'record_id'。我需要自动化的原因是因为我必须对几个 json 文件应用相同类型的逻辑,其中一些文件一旦展平就有 100 列。

谢谢!

【问题讨论】:

    标签: python json python-3.x loops


    【解决方案1】:

    您已经使用 json_normalize 创建了数据帧,因此将它们收集到一个列表中,连接,然后重命名该列

    看看这是否有效。 d 只是将上面的两个响应放入一个列表中。

    df_hold_list = []
    for i in d:
        df_hold_list.append(pd.json_normalize(i['transactionDetail']))
    transaction_detail_1 = pd.concat(df_hold_list, axis=0).reset_index(drop=True)
    transaction_detail_1.rename(columns={'transactionID': 'record_id'}, inplace=True)
    

    输出:

                                              record_id      transactionTimestamp inLanguage productID productVersion
    0  rrt-0a75e3331e9d4a100-b-se-17175-7612138-13_1571  2020-11-22T07:22:14.346Z      en-US    aasmcu              1
    1   rrt-04b146343b2275455-a-se-17594-7595335-2_1570  2020-11-22T07:22:15.115Z      en-US    aasmcu              1
    

    【讨论】:

      猜你喜欢
      • 2021-02-02
      • 2018-09-24
      • 2021-02-03
      • 2021-09-17
      • 1970-01-01
      • 2021-06-28
      • 2018-06-02
      • 2020-10-17
      • 2019-07-09
      相关资源
      最近更新 更多