【问题标题】:Pandas Dataframe split multiple key values to different columnsPandas Dataframe 将多个键值拆分到不同的列
【发布时间】:2020-09-13 01:47:04
【问题描述】:

我有一个格式如下的数据框列:

col1    col2   
 A     [{'Id':42,'prices':['30',’78’]},{'Id': 44,'prices':['20','47',‘89’]}]
 B     [{'Id':47,'prices':['30',’78’]},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}]

如何将其转换为以下内容?

col1    Id            price   
  A     42         ['30',’78’]
  A     44         ['20','47',‘89’]
  B     47         ['30',’78’]
  B     94         ['20']
  B     84         ['20','98']

我正在考虑使用 apply 和 lambda 作为解决方案,但我不确定如何。

编辑:为了重新创建这个数据框,我使用以下代码:

data = [['A', "[{'Id':42,'prices':['30','78']},{'Id': 44,'prices':['20','47','89']}]"], 
        ['B', "[{'Id':47,'prices':['30','78']},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}]"]] 

df = pd.DataFrame(data, columns = ['col1', 'col2'])

【问题讨论】:

  • 请用 to_dict 输出你的数据

标签: python python-3.x pandas dataframe


【解决方案1】:

如果col2列中有列表的解决方法:

print (type(df['col2'].iat[0]))
<class 'list'>

L = [{**{'col1': a}, **x} for a, b in df[['col1','col2']].to_numpy() for x in b]

df = pd.DataFrame(L)
print (df)
  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]

如果有字符串:

print (type(df['col2'].iat[0]))
<class 'str'>

import ast

L = [{**{'col1': a}, **x} for a, b in df[['col1','col2']].to_numpy() for x in ast.literal_eval(b)]
df = pd.DataFrame(L)
print (df)
  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]

为了更好地理解,可以使用:

import ast

L = []
for a, b in df[['col1','col2']].to_numpy():
    for x in ast.literal_eval(b):
        d = {'col1': a}
        out = {**d, **x}
        L.append(out)

df = pd.DataFrame(L)
print (df)
  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]

【讨论】:

  • **是什么意思?
  • @colla - 用于合并 2 个字典 - z = {**x, **y}, link
  • 你的回答说我在以下有语法错误:[{'Id':47,'prices':['30','78'']},{'Id':94 ,'prices':['20']},{'Id':84,'prices':['20','98']}] 有什么想法吗?
  • @colla - 数据有问题,似乎有些数据很乱。是否可能共享输入数据,如果可能的话最好的 json?
  • 我编辑了我的问题,以便您可以轻松地重新创建我的数据
【解决方案2】:

将“数据”的第二个参数视为列表。

data= [
  ['A', [{'Id':42,'prices':['30','78']},{'Id': 44,'prices':['20','47','89']}]], 
  ['B', [{'Id':47,'prices':['30','78']}, {'Id':94,'prices':['20']},{'Id':84,'prices': 
        ['20','98']}]]
  ]

t_list = []

for i in range(len(data)):
    for j in range(len(data[i][1])):
        t_list.append((data[i][0], data[i][1][j]['Id'], data[i][1][j]['prices']))

df = pd.DataFrame(t_list, columns=['col1', 'id', 'price'])
print(df)

     col1  id         price
0    A     42      [30, 78]
1    A     44  [20, 47, 89]
2    B     47      [30, 78]
3    B     94          [20]
4    B     84      [20, 98]

【讨论】:

    【解决方案3】:

    您可以在此处使用df.explodepd.Series.applydf.set_indexdf.reset_index

    df.set_index('col1').explode('col2')['col2'].apply(pd.Series).reset_index()
    
      col1  Id        prices
    0    A  42      [30, 78]
    1    A  44  [20, 47, 89]
    2    B  47      [30, 78]
    3    B  94          [20]
    4    B  84      [20, 98]
    

    col2为字符串时,使用ast.literal_eval

    import ast
    
    data = [['A', "[{'Id':42,'prices':['30','78']},{'Id': 44,'prices':['20','47','89']}]"], 
            ['B', "[{'Id':47,'prices':['30','78']},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}]"]] 
    
    df = pd.DataFrame(data, columns = ['col1', 'col2'])
    df['col2'] = df['col2'].map(ast.literal_eval)
    
    df.set_index('col1').explode('col2')['col2'].apply(pd.Series).reset_index()
    
      col1  Id        prices
    0    A  42      [30, 78]
    1    A  44  [20, 47, 89]
    2    B  47      [30, 78]
    3    B  94          [20]
    4    B  84      [20, 98]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-11-17
      • 1970-01-01
      • 2016-02-17
      • 1970-01-01
      • 2018-06-08
      • 2018-05-28
      相关资源
      最近更新 更多