【问题标题】:Pandas - json normalize inside dataframePandas - 数据框内的 json 规范化
【发布时间】:2020-05-22 21:56:46
【问题描述】:

我想将数据框中的一列分解为多列。

我有一个具有以下配置的数据框:


    GroupId,SubGroups,Type,Name
    -4781505553015217258,"{'GroupId': -732592932641342965, 'SubGroups': [], 'Type': 'DefaultSite', 'Name': 'Default Site'}",OrganisationGroup,CompanyXYZ
    -4781505553015217258,"{'GroupId': 8123255835936628631, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'MERCEDES BENZ'}",OrganisationGroup,CompanyXYZ
    -4781505553015217258,"{'GroupId': -1785570219922840611, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'VOLVO'}",OrganisationGroup,CompanyXYZ
    -4781505553015217258,"{'GroupId': -3670461095557699088, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'SCANIA'}",OrganisationGroup,CompanyXYZ
    -4781505553015217258,"{'GroupId': 8683757391859854416, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'DRIVERS'}",OrganisationGroup,CompanyXYZ
    -4781505553015217258,"{'GroupId': -8066654520755643389, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'X - DECOMMISSION'}",OrganisationGroup,CompanyXYZ
    -4781505553015217258,"{'GroupId': 4177323092254043025, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'X-INSTALLATION'}",OrganisationGroup,CompanyXYZ
    -4781505553015217258,"{'GroupId': -6088426161802844604, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'FORD'}",OrganisationGroup,CompanyXYZ
    -4781505553015217258,"{'GroupId': 8512440039365422841, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'HEAVY VEHICLES'}",OrganisationGroup,CompanyXYZ

我想创建一个新的数据框,其中SubGroups 列被分解为其组件。请注意,SubGroups 列内的名称以 SubGroups_ 为前缀

GroupId, SubGroup_GroupId, SubGroup_SubGroups, SubGroup_Type, SubGroup_Name, Type, Name
-4781505553015217258, -732592932641342965, [], 'DefaultSite', 'Default Site', OrganisationGroup, CompanyXYZ
-4781505553015217258, 8123255835936628631, [], 'SiteGroup', 'MERCEDES BENZ', OrganisationGroup, CompanyXYZ

我已经尝试了以下代码:

for row in AllSubGroupsDF.itertuples():
    newDF= newDF.append((pd.io.json.json_normalize(row.SubGroups)))

但它会返回

GroupId,SubGroups,Type,Name
-732592932641342965,[],DefaultSite,Default Site
8123255835936628631,[],SiteGroup,MERCEDES BENZ
-1785570219922840611,[],SiteGroup,VOLVO
-3670461095557699088,[],SiteGroup,SCANIA
8683757391859854416,[],SiteGroup,DRIVERS
-8066654520755643389,[],SiteGroup,X - DECOMMISSION
4177323092254043025,[],SiteGroup,X-INSTALLATION
-6088426161802844604,[],SiteGroup,FORD
8512440039365422841,[],SiteGroup,HEAVY VEHICLES

我想把它全部放在一个数据框中,但我不知道怎么做。请帮忙?

【问题讨论】:

    标签: json pandas dataframe


    【解决方案1】:

    你可以尝试使用ast包:-

    import pandas as pd
    import ast
    data = [[-4781505553015217258,"{'GroupId': -732592932641342965, 'SubGroups': [], 'Type': 'DefaultSite', 'Name': 'Default Site'}","OrganisationGroup","CompanyXYZ"],
        [-4781505553015217258,"{'GroupId': 8123255835936628631, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'MERCEDES BENZ'}","OrganisationGroup","CompanyXYZ"],
        [-4781505553015217258,"{'GroupId': -1785570219922840611, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'VOLVO'}","OrganisationGroup","CompanyXYZ"],
        [-4781505553015217258,"{'GroupId': -3670461095557699088, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'SCANIA'}","OrganisationGroup","CompanyXYZ"],
        [-4781505553015217258,"{'GroupId': 8683757391859854416, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'DRIVERS'}","OrganisationGroup","CompanyXYZ"],
        [-4781505553015217258,"{'GroupId': -8066654520755643389, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'X - DECOMMISSION'}","OrganisationGroup","CompanyXYZ"],
        [-4781505553015217258,"{'GroupId': 4177323092254043025, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'X-INSTALLATION'}","OrganisationGroup","CompanyXYZ"],
        [-4781505553015217258,"{'GroupId': -6088426161802844604, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'FORD'}","OrganisationGroup","CompanyXYZ"],
        [-4781505553015217258,"{'GroupId': 8512440039365422841, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'HEAVY VEHICLES'}","OrganisationGroup","CompanyXYZ"]]
    df = pd.DataFrame(data,columns=["GroupId","SubGroups","Type","Name"])
    df["SubGroup_GroupId"] = df["SubGroups"].map(lambda x: ast.literal_eval(x)["GroupId"])
    df["SubGroup_SubGroups"] = df["SubGroups"].map(lambda x: ast.literal_eval(x)["SubGroups"])
    df["SubGroup_Type"] = df["SubGroups"].map(lambda x: ast.literal_eval(x)["Type"])
    df["SubGroup_Name"] = df["SubGroups"].map(lambda x: ast.literal_eval(x)["Name"])
    df
    

    希望这会有所帮助!

    【讨论】:

    • 嗨图欣。谢谢你的答案。就我而言,我必须将: ast.literal_eval(x) 转换为 ast.literal_eval(str(x)) 并且它工作正常。再次感谢
    • 当然图欣。我做到了
    猜你喜欢
    • 2020-12-15
    • 2019-12-22
    • 2020-07-22
    • 2012-09-13
    • 2020-06-21
    • 2020-12-11
    • 2020-08-01
    • 1970-01-01
    • 2017-01-30
    相关资源
    最近更新 更多