【问题标题】:split list elements into sub-elements in pandas dataframe将列表元素拆分为熊猫数据框中的子元素
【发布时间】:2018-06-27 09:00:40
【问题描述】:

我有一个数据框:-

Filtered_data

['defence possessed russia china','factors driving china modernise']
['force bolster pentagon','strike capabilities pentagon congress detailing china']
[missiles warheads', 'deterrent face continued advances']
......
......

我只想将每个列表元素拆分为子元素(标记词)。所以,输出我正在寻找:-

Filtered_data

[defence, possessed,russia,factors,driving,china,modernise]
[force,bolster,strike,capabilities,pentagon,congress,detailing,china]
[missiles,warheads, deterrent,face,continued,advances]

这是我尝试过的代码

for text in df['Filtered_data'].iteritems():
for i in text.split():
    print (i)

【问题讨论】:

  • 为什么反对票?我是python新手。抱歉,如果在这里问一个愚蠢的问题
  • 反对票不是因为这个问题很愚蠢(事实并非如此),而是因为you do not provide sufficient information。我们必须猜测您的数据结构,这使问题变得模棱两可。
  • 另外一个原因是你需要添加你的代码来提问,你尝试什么......

标签: python arrays python-3.x pandas


【解决方案1】:

将列表推导与split 结合使用并展开:

df['Filtered_data'] = df['Filtered_data'].apply(lambda x: [z for y in x for z in y.split()])
print (df)
                                       Filtered_data
0  [defence, possessed, russia, china, factors, d...
1  [force, bolster, pentagon, strike, capabilitie...
2  [missiles, warheads, deterrent, face, continue...

编辑:

对于唯一值是标准方式使用sets:

df['Filtered_data'] = df['Filtered_data'].apply(lambda x: list(set([z for y in x for z in y.split()])))
print (df)
                                       Filtered_data
0  [russia, factors, defence, driving, china, mod...
1  [capabilities, detailing, china, force, pentag...
2  [deterrent, advances, face, warheads, missiles...

但如果值的顺序很重要,请使用pandas.unique:

df['Filtered_data'] = df['Filtered_data'].apply(lambda x: pd.unique([z for y in x for z in y.split()]).tolist())
print (df)
                                       Filtered_data
0  [defence, possessed, russia, china, factors, d...
1  [force, bolster, pentagon, strike, capabilitie...
2  [missiles, warheads, deterrent, face, continue...

【讨论】:

  • @James - 只添加 set 喜欢 list(set([z for y in x for z in y.split()]))
  • 在这方面需要您的帮助:- https://stackoverflow.com/questions/51574485/match-keywords-in-pandas-column-with-another-list-of-elements。我没有得到提到的解决方案
【解决方案2】:

您可以使用itertools.chain + toolz.uniquetoolz.uniqueset 相比的好处是它保留了顺序。

from itertools import chain
from toolz import unique

df = pd.DataFrame({'strings': [['defence possessed russia china','factors driving china modernise'],
                               ['force bolster pentagon','strike capabilities pentagon congress detailing china'],
                               ['missiles warheads', 'deterrent face continued advances']]})

df['words'] = df['strings'].apply(lambda x: list(unique(chain.from_iterable(i.split() for i in x))))

print(df.iloc[0]['words'])

['defence', 'possessed', 'russia', 'china', 'factors', 'driving', 'modernise']

【讨论】:

    猜你喜欢
    • 2019-10-25
    • 2021-02-04
    • 2018-10-23
    • 2019-12-02
    • 1970-01-01
    • 1970-01-01
    • 2017-11-27
    • 1970-01-01
    相关资源
    最近更新 更多