【问题标题】:Convert Dataframe to Json with nested arrays使用嵌套数组将 Dataframe 转换为 Json
【发布时间】:2019-10-21 06:21:51
【问题描述】:

我有一个数据框df,如下所示

   ID               Aisle            Residence        HomePhone        CellPhone
   ------------------------------------------------------------------------------
0  1245,3214        A1, A2, A3, A4   Home             NaN              888888888
1  5674             B2,B3            Cell             777777777        999999999

预期结果:

{
 "0":{
    {
      "column": "ID",
      "values": [
        "1245",
        "3214"
      ]
    },
    {
      "column": "Aisle",
      "values": [
        "A1",
        "A2",
        "A3",
        "A4"
      ]
    },
    {
      "column": "Residence",
      "values": [
        "Home"
      ]
    },
    {
      "column": "HomePhone",
      "values": []
    },
    {
      "column": "CellPhone",
      "values": [
        "888888888"
      ]
    }
   },
"1":{
    {
      "column": "ID",
      "values": [
        "5674"
      ]
    },
    {
      "column": "Aisle",
      "values": [
        "B2",
        "B3"
      ]
    },
    {
      "column": "Residence",
      "values": [
        "Cell"
      ]
    },
    {
      "column": "HomePhone",
      "values": [
        "777777777"
      ]
    },
    {
      "column": "CellPhone",
      "values": [
        "999999999"
      ]
    }
   },

我有 2 行,分别是 01,而 json 下的每一行都有信息。 所以基本上,我想添加属性并将它们分配给列名的值,例如"column":"Aisle","values":["A1","B1,...]"

另外,我有一个限制,即列名总是会更改(ID、Aisle、Residence...等)并且列数会有所不同,因此在从 DF 转换为 JSON 时我无法对列进行硬编码。

【问题讨论】:

标签: python pandas pyspark


【解决方案1】:

我认为这是最接近您正在寻找的东西。使用DataFrame.to_json

df2=df.copy()
df2[['ID','Aisle']]=df2[['ID','Aisle']].apply(lambda x: x.replace(' ','').str.split(','))
print(df2)

             ID                Aisle Residence    HomePhone  CellPhone
0  [1245, 3214]  [A1,  A2,  A3,  A4]      Home          NaN  888888888
1        [5674]             [B2, B3]      Cell  777777777.0  999999999

df2.T.to_json()

输出:

'{"0":{"ID":["1245","3214"],"Aisle":["A1"," A2"," A3"," A4"],"Residence":"Home","HomePhone":null,"CellPhone":888888888},"1":{"ID":["5674"],"Aisle":["B2","B3"],"Residence":"Cell","HomePhone":777777777.0,"CellPhone":999999999}}'

你也可以试试这个:

def split_func(x):
    try:
        return x.replace(' ','').str.split(',') 
    except:
        return x
df2=df2.apply(split_func)

但请记住,所有 str 类型的单元格都将转换为列表

df2.T.to_json()

'{"0":{"ID":["1245","3214"],"Aisle":["A1","A2","A3","A4"],"Residence":["Home"],"HomePhone":null,"CellPhone":888888888},"1":{"ID":["5674"],"Aisle":["B2","B3"],"Residence":["Cell"],"HomePhone":777777777.0,"CellPhone":999999999}}'

【讨论】:

  • 感谢您的回复。如果列名发生变化怎么办?有没有办法在不限制我自己的列确切名称的情况下做到这一点?
  • 我很高兴为您提供帮助:)。我为这个案例添加了一个额外的代码
  • 请检查我的答案
猜你喜欢
  • 2020-09-29
  • 2020-10-15
  • 2019-06-09
  • 2019-09-26
  • 2022-01-09
  • 2014-06-27
  • 1970-01-01
  • 2021-04-06
  • 1970-01-01
相关资源
最近更新 更多