【问题标题】:Find minimum across multiple worksheet using pandas使用熊猫在多个工作表中查找最小值
【发布时间】:2023-03-10 16:43:01
【问题描述】:

如何在整个工作表中的每个索引的多个工作表中找到最小值

假设,

  worksheet 1

    index    A   B   C
       0     2   3   4.28
       1     3   4   5.23
    worksheet 2

    index    A   B   C
        0    9   6   5.9
        1    1   3   4.1

    worksheet 3

    index    A   B   C
        0    9   6   6.0
        1    1   3   4.3
 ...................(Worksheet 4,Worksheet 5)...........
by comparing C column, I want an answer, where dataframe looks like

index      min(c)
    0       4.28
    1       4.1

【问题讨论】:

  • 只能接受一个答案 ;)

标签: python excel pandas min worksheet


【解决方案1】:
from functools import reduce

reduce(np.fmin, [ws1.C, ws2.C, ws3.C])

index
0    4.28
1    4.10
Name: C, dtype: float64

这很好地概括了理解

reduce(np.fmin, [w.C for w in [ws1, ws2, ws3, ws4, ws5]])

如果你必须坚持你的栏目名称

from functools import reduce

reduce(np.fmin, [ws1.C, ws2.C, ws3.C]).to_frame('min(C)')

       min(C)
index        
0        4.28
1        4.10

您还可以在字典上使用pd.concat,并使用pd.Series.minlevel=1 参数

pd.concat(dict(enumerate([w.C for w in [ws1, ws2, ws3]]))).min(level=1)
# equivalently
# pd.concat(dict(enumerate([w.C for w in [ws1, ws2, ws3]])), axis=1).min(1)

index
0    4.28
1    4.10
Name: C, dtype: float64

注意:

dict(enumerate([w.C for w in [ws1, ws2, ws3]]))

是另一种说法

{0: ws1.C, 1: ws2.C, 2: ws3.C}

【讨论】:

    【解决方案2】:

    您需要 read_excelsheetname=None 参数 OrderedDicts 来自 all sheetnames ,然后使用 reducenumpy.fmin 列出理解:

    dfs = pd.read_excel('file.xlsx', sheetname=None)
    print (dfs)
    OrderedDict([('Sheet1',    A  B     C
    0  2  3  4.28
    1  3  4  5.23), ('Sheet2',    A  B    C
    0  9  6  5.9
    1  1  3  4.1), ('Sheet3',    A  B    C
    0  9  6  6.0
    1  1  3  4.3)])
    
    from functools import reduce
    
    df = reduce(np.fmin, [v['C'] for k,v in dfs.items()])
    print (df)
    0    4.28
    1    4.10
    Name: C, dtype: float64
    

    concat 的解决方案:

    df = pd.concat([v['C'] for k,v in dfs.items()],axis=1).min(axis=1)
    print (df)
    0    4.28
    1    4.10
    dtype: float64
    

    如果需要在read_excel中定义索引:

    dfs = pd.read_excel('file.xlsx', sheetname=None, index_col='index')
    print (dfs)
    OrderedDict([('Sheet1',        A  B     C
    index            
    0      2  3  4.28
    1      3  4  5.23), ('Sheet2',        A  B    C
    index           
    0      9  6  5.9
    1      1  3  4.1), ('Sheet3',        A  B    C
    index           
    0      9  6  6.0
    1      1  3  4.3)])
    
    
    df = pd.concat([v['C'] for k,v in dfs.items()], axis=1).min(axis=1)
    print (df)
    index
    0    4.28
    1    4.10
    dtype: float64
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-01-13
      • 2019-07-21
      • 2020-08-09
      • 2018-03-24
      • 2017-03-15
      • 2012-08-23
      相关资源
      最近更新 更多