【问题标题】:python - create new sub-matrix by filtering columns from matrix/bidimensional listpython - 通过从矩阵/二维列表中过滤列来创建新的子矩阵
【发布时间】:2015-03-11 07:09:29
【问题描述】:

例如下面的一个矩阵,例如

matrix = [
    ['month','val1','val2','valn'],
    ['jan','100','200','300'],
    ['feb','101',201',302'],
    ['march','102','202','303'],
    ['april','103','203','303'],
    ['march','104','204','304']
]

我想根据列索引或名称列表创建一个新矩阵(过滤),所以

filter_col_indx = {0,2}
filter_col_name = {'month','val2'}

会产生相同的输出:

matrix2 = [
    ['month,'val2'],
    ['jan','200'],
    ['feb','201'],
    ['march','202'],
    ['april','203'],
    ['march','204']
]

对于大型矩阵,最有效的方法是什么? list_of_columns 可以变化。

谢谢

【问题讨论】:

  • 我会查看pandas。您可以使用两行 df = pd.DataFrame(matrix[1:], columns=matrix[0]) df[['month', 'val2']] 对数据进行子集化

标签: python matrix filtering


【解决方案1】:

这可以使用operator.itemgetter

import operator
matrix = [
    ['month','val1','val2','valn'],
    ['jan','100','200','300'],
    ['feb','101','201','302'],
    ['march','102','202','303'],
    ['april','103','203','303'],
    ['march','104','204','304']
]

filter_col_indx = [0,2]
getter = operator.itemgetter(*filter_col_indx)
matrix2 = [list(getter(row)) for row in matrix]
print(matrix2)

产量

[['month', 'val2'],
 ['jan', '200'],
 ['feb', '201'],
 ['march', '202'],
 ['april', '203'],
 ['march', '204']]

operator.itemgetter(*filter_col_indx) 返回一个以序列为参数的函数,并返回序列中的第 0 项和第 2 项。因此,您可以将此函数应用于每一行以从matrix 中选择所需的值。


如果您安装pandas,那么您可以将matrix 设为DataFrame 并选择所需的列,如下所示:

import pandas as pd

matrix = [
    ['month','val1','val2','valn'],
    ['jan','100','200','300'],
    ['feb','101','201','302'],
    ['march','102','202','303'],
    ['april','103','203','303'],
    ['march','104','204','304']
]
df = pd.DataFrame(matrix[1:], columns=matrix[0])
print(df[['month', 'val2']])

产量

   month val2
0    jan  200
1    feb  201
2  march  202
3  april  203
4  march  204

您可能会喜欢使用 pandas,因为它使许多数据处理操作变得非常容易。

【讨论】:

  • 感谢您提供如此好的示例。一个小问题,使用 operator.itemgetter 的示例生成元组列表而不是列表列表。有没有办法将其转换回所需的格式: [ ['month' 'val2'], ['jan' '200'] ... ] ?
【解决方案2】:

如果您总是对整列感兴趣,我认为使用包含列作为列表的字典来存储数据是合适的:

data = {'month': ['jan', 'feb', 'march', 'april', 'march'],
        'val1': [100, 101, 102, 103, 104],
        'val2': [200, 201, 202, 203, 204],
        ...
       }

要检索列(我现在已经横向编写了...),您可以:

{key: data[key] for key in ['month', 'val2']}

【讨论】:

    【解决方案3】:

    这是一个 numpy 版本:

    import numpy as np
    
    matrix = np.array([
        ['month','val1','val2','valn'],
        ['jan','100','200','300'],
        ['feb','101','201','302'],
        ['march','102','202','303'],
        ['april','103','203','303'],
        ['march','104','204','304']
    ])
    
    search = ['month', 'val2']
    
    indexes = matrix[0,:].searchsorted(search) #search only the first row
    # or indexes = [0, 2]
    print matrix[:,indexes] 
    >>> [['month' 'val2']
         ['jan' '200']
         ['feb' '201']
         ['march' '202']
         ['april' '203']
         ['march' '204']]
    

    【讨论】: