【问题标题】:Python converting from a panda column to a list?Python从熊猫列转换为列表?
【发布时间】:2019-03-08 08:31:28
【问题描述】:

我想知道我是否有以下格式的文件 我想把每一列放在一个列表中,因为我有不止一个句子: 所以输出看起来像这样

[['Learning centre of The University of Lahore is established for professional development.'], 
 ['These events, destroyed the bond between them.']]

动词列也是如此。这是我尝试过的,但它将所有内容都放在一个列表中,而不是列表列表中

train_fn="/content/data/wiki/wiki1.train.oie"


dfE = pandas.read_csv(train_fn, sep= "\t",
                         header=0,
                         keep_default_na=False)
train_textEI = dfE['word'].tolist()
train_textEI = [' '.join(t.split()) for t in train_textEI]
train_textEI = np.array(train_textEI, dtype=object)[:, np.newaxis]

它输出列表中的每个单词

[['Learning'],['Center'],['of'],['The'],['University'],['of'],
 ['Lahore'],['is'],['established'],['for'],['the'],
 ['professional'],['development'],['.'],['These'],['events'],[','],
 ['destroyed'],['the'],['bond'],['between'],['them'],['.']]

【问题讨论】:

  • 您需要df.groupby('Verb')['word'].apply(lambda x: [' '.join(x)]).tolist() 吗?
  • @jazrael 但是如果两个连续的句子有相同的动词怎么办?我想它会合并 2 个句子,我尝试根据 wordId=0 进行拆分,但我做不到
  • 所以df.groupby(df['word_id'].eq(0).cumsum())['word'].apply(lambda x: [' '.join(x)]).tolist() ?

标签: python pandas file nlp


【解决方案1】:

通过将word_idSeries.eqSeries.cumsumgroupby 进行比较来创建助手Series 并转换为列表,最后将输出Series 转换为列表:

df = pd.DataFrame({'word_id':[0,1,2,0,1],
                   'word':['a s','ds d','sss dd','d','sd ds']})

L = df.groupby(df['word_id'].eq(0).cumsum())['word'].apply(lambda x: [' '.join(x)]).tolist()
print (L)
[['a s ds d sss dd'], ['d sd ds']]

【讨论】:

    猜你喜欢
    • 2014-03-05
    • 2017-04-08
    • 1970-01-01
    • 2019-10-12
    • 2019-02-25
    • 1970-01-01
    • 2021-04-17
    • 1970-01-01
    • 2021-10-20
    相关资源
    最近更新 更多