【问题标题】:Convert List of lists in a pandas column to string将 pandas 列中的列表转换为字符串
【发布时间】:2019-01-14 20:52:48
【问题描述】:

如何将包含列表列表的 pandas df 列转换为字符串。 df 中“类别”列的 sn-p

[['Electronics', 'Computers & Accessories', 'Cables & Accessories', 'Cables & Interconnects', 'USB Cables'], ['Video Games', 'Sony PSP']]
[['Video Games', 'PlayStation 3', 'Accessories', 'Controllers', 'Gamepads']]
[['Cell Phones & Accessories', 'Accessories', 'Chargers', 'Travel Chargers'], ['Video Games', 'Nintendo DS']]

我尝试了以下代码:

df.loc[:,"categories"]=[item for sublist in df.loc[:,"categories"] for item in sublist]

但它给了我一个错误。有没有其他方法可以做到这一点?

ValueError:值的长度与索引的长度不匹配

预期列:

'Electronics', 'Computers & Accessories', 'Cables & Accessories', 'Cables & Interconnects', 'USB Cables','Video Games', 'Sony PSP'
'Video Games', 'PlayStation 3', 'Accessories', 'Controllers', 'Gamepads'
'Cell Phones & Accessories', 'Accessories', 'Chargers', 'Travel Chargers','Video Games', 'Nintendo DS'

【问题讨论】:

  • 你的预期输出是什么?
  • 用预期的输出修改了原来的 qn

标签: string list pandas dataframe text


【解决方案1】:

使用带有join的嵌套生成器:

df["categories"]=[', '.join(item for sublist in x for item in sublist) for x in df["categories"]]

如果性能在较大的DataFrame 中很重要:

from  itertools import chain

df["categories"] = [', '.join(chain.from_iterable(x)) for x in df["categories"]]

print (df)
                                          categories
0  Electronics, Computers & Accessories, Cables &...
1  Video Games, PlayStation 3, Accessories, Contr...
2  Cell Phones & Accessories, Accessories, Charge...

时序:(实际数据应该不同,最好先测试一下):

df = pd.concat([df] * 10000, ignore_index=True)


In [45]: %timeit df["c1"]=[', '.join(item for sublist in x for item in sublist) for x in df["categories"]]
39 ms ± 706 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [46]: %timeit df["c2"]=[', '.join(chain.from_iterable(x)) for x in df["categories"]]
22.1 ms ± 258 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [47]: %timeit df['c3'] = df["categories"].apply(lambda x: ', '.join(str(r) for v in x for r in v))
66.7 ms ± 695 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

【讨论】:

    【解决方案2】:

    你需要列表理解

    df['col'] = df.col.apply(lambda x: ', '.join(str(r) for v in x for r in v))
    

    输出:

        col
    0   Electronics, Computers & Accessories, Cables &...
    1   Video Games, PlayStation 3, Accessories, Contr...
    2   Cell Phones & Accessories, Accessories, Charge...
    

    【讨论】:

      猜你喜欢
      • 2014-09-30
      • 2016-09-17
      • 2020-11-20
      • 1970-01-01
      • 2021-03-09
      • 2018-01-27
      • 2018-07-13
      • 1970-01-01
      • 2016-08-30
      相关资源
      最近更新 更多