Pandas Groupby 多个条件 KeyError答案

【问题标题】：Pandas Groupby Multiple Conditions KeyErrorPandas Groupby 多个条件 KeyError
【发布时间】：2021-09-21 15:21:07
【问题描述】：

我在下面的插入中有一个名为 df_out 的 df，其列名如下所示，但由于某种原因，我不能对列标题使用“groupby”函数，因为它一直给我 KeyError：“year”。我已经研究并尝试去除空白、重置索引、在我的 groupby 设置之前允许空白等，但我无法通过这个 KeyError。df_out 看起来像这样：

df_out.columns
Out[185]: 
Index(['year', 'month', 'BARTON CHAPEL', 'BARTON I', 'BIG HORN I',
       'BLUE CREEK', 'BUFFALO RIDGE I', 'CAYUGA RIDGE', 'COLORADO GREEN',
       'DESERT WIND', 'DRY LAKE I', 'EL CABO', 'GROTON', 'NEW HARVEST',
       'PENASCAL I', 'RUGBY', 'TULE'],
      dtype='object', name='plant_name')

但是，当我使用 df_out.head() 时，我会在“植物名称”的前导列中得到不同的答案，所以这可能是错误来自或相关的地方。这是来自 -

的输出列

df_out.head()
Out[187]: 
plant_name  year  month  BARTON CHAPEL  BARTON I  BIG HORN I  BLUE CREEK  \
0           1991      1       6.432285  7.324126    5.170067    6.736384   
1           1991      2       7.121324  6.973586    4.922693    7.473527   
2           1991      3       8.125793  8.681317    5.796599    8.401855   
3           1991      4       7.454972  8.037764    7.272292    7.961625   
4           1991      5       7.012809  6.530013    6.626949    6.009825   

plant_name  BUFFALO RIDGE I  CAYUGA RIDGE  COLORADO GREEN  DESERT WIND  \
0                  7.163790      7.145323        5.783629     5.682003   
1                  7.595744      7.724717        6.245952     6.269524   
2                  8.111411      9.626075        7.918871     6.657648   
3                  8.807458      8.618806        7.011444     5.848736   
4                  7.734852      6.267097        7.410013     5.099610   

plant_name  DRY LAKE I    EL CABO    GROTON  NEW HARVEST  PENASCAL I  \
0             4.721089  10.747285  7.456640     6.921801    6.296425   
1             5.095923   8.891057  7.239762     7.449122    6.484241   
2             8.409637  12.238508  8.274046     8.824758    8.444960   
3             7.893694  10.837139  6.381736     8.840431    7.282444   
4             8.496976   8.636882  6.856747     7.469825    7.999530   

plant_name     RUGBY       TULE  
0           7.028360   4.110605  
1           6.394687   5.257128  
2           6.859462  10.789516  
3           7.590153   7.425153  
4           7.556546   8.085255

得到 KeyError 的 groupby 语句如下所示，我正在尝试根据列表中 df_out 中的列子集 - 'west' 计算年和月行的平均值：

west=['BIG HORN I','DRY LAKE I', 'TULE']
westavg = df_out[df_out.columns[df_out.columns.isin(west)]].groupby(['year','month']).mean()

非常感谢，

【问题讨论】：

标签： pandas keyerror

【解决方案1】：

您的代码可以分解为：

westavg =  (df_out[df_out.columns[df_out.columns.isin(west)]]
                 .groupby(['year','month']).mean()
           )

这不起作用，因为['year','month'] 不是df_out[df_out.columns[df_out.columns.isin(west)]] 的列。

试试：

west_cols = [c for c in df_out if c in west]
westavg = df_out.groupby(['year','month'])[west_cols].mean()

【讨论】：

好的，是的，groupby 现在可以工作了，但我很困惑为什么'year' 和 'month' 似乎是 df_out.year 或 df_out.month 这样的列都提供年份的输出和月份，这似乎是 df_out 的列标题。
它们是 df_out 的列，而不是 df_out[df_out.columns[df_out.columns.isin(west)]]，您试图在 groupby 上打开。

【解决方案2】：

好的，在下面的 Quang Hoang 的帮助下，我理解了这个问题并提出了这个可行的答案，我可以使用 .intersection 更好地理解：

westavg = df_out[df_out.columns.intersection(west)].mean(axis=1)

#给出列表“west”定义的列子集中每一行的平均值。

【讨论】：