【问题标题】:Get elements from column array by index in Dataframe Pandas通过 Dataframe Pandas 中的索引从列数组中获取元素
【发布时间】:2022-06-27 21:32:42
【问题描述】:

我有一个数据框:

import pandas as pd
data = {'id':[1,2,3],
            'tokens': [[ 'in', 'the' , 'morning',
                             'cat', 'run', 'today', 'very', 'quick'],['dog', 'eat', 'meat', 'chicken', 'from', 'bowl'],
                            ['mouse', 'hides', 'from', 'a', 'cat']]}
        
df = pd.DataFrame(data)

我还有一个索引列表。

lst_index = [[3, 4, 5], [0, 1, 2], [2, 3, 4]]

我想创建一个包含tokens 列数组中的元素的列。此外,这些元素由来自lst_index 的索引获取。所以它会是:

    id             tokens                                          new
0   1   [in, the, morning, cat, run, today, very, quick]    [cat, run, today]
1   2   [dog, eat, meat, chicken, from, bowl]               [dog, eat, meat]
2   3   [mouse, hides, from, a, cat]                        [from, a, cat]

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    使用简单的列表推导:

    lst_index = [[3, 4, 5], [0, 1, 2], [2, 3, 4]]
    
    df['new'] = [[l[i] for i in idx] for idx,l in zip(lst_index, df['tokens'])]
    

    输出:

       id                                            tokens                new
    0   1  [in, the, morning, cat, run, today, very, quick]  [cat, run, today]
    1   2             [dog, eat, meat, chicken, from, bowl]   [dog, eat, meat]
    2   3                      [mouse, hides, from, a, cat]     [from, a, cat]
    

    【讨论】:

      【解决方案2】:

      你可以同时遍历字典和列表,获取new列:

      data = {'id':[1,2,3],
                  'tokens': [[ 'in', 'the' , 'morning',
                                   'cat', 'run', 'today', 'very', 'quick'],['dog', 'eat', 'meat', 'chicken', 'from', 'bowl'],
                                  ['mouse', 'hides', 'from', 'a', 'cat']]}
      lst_index = [[3, 4, 5], [0, 1, 2], [2, 3, 4]]
      l = []
      
      for i in range(len(data["tokens"])):
          l.append([])
          for j in range(len(lst_index[i])):
              l[i].append(data["tokens"][i][lst_index[i][j]])
      
      data["new"] = l
      print(data)
      

      输出:

      {'id': [1, 2, 3], 'tokens': [['in', 'the', 'morning', 'cat', 'run', 'today', 'very', 'quick'], ['dog', 'eat', 'meat', 'chicken', 'from', 'bowl'], ['mouse', 'hides', 'from', 'a', 'cat']], 'new': [['cat', 'run', 'today'], ['dog', 'eat', 'meat'], ['from', 'a', 'cat']]}
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2013-12-03
        • 1970-01-01
        • 2021-10-21
        • 1970-01-01
        • 1970-01-01
        • 2018-12-31
        • 1970-01-01
        相关资源
        最近更新 更多