【问题标题】:Get mean column based on grouped values of two other columns根据其他两列的分组值获取平均列
【发布时间】:2020-02-08 10:46:26
【问题描述】:

我有一些学校数据:

data = {'name': ['school a', 'school b', 'school c', 'school d', 'school e', 'school f'], 
       'type': ['a', 'a', 'b', 'b', 'a', 'b'],
        'location': ['county a', 'county a', 'county b', 'county b', 'county b', 'county a'], 
        'avg_score': [9, 7, 5, 7, 6, 8]
       }

df = pd.DataFrame(data)


Out:
    name    type    location    avg_score
0   school a    a   county a    9
1   school b    a   county a    7
2   school c    b   county b    5
3   school d    b   county b    7
4   school e    a   county b    6
5   school f    b   county a    8

我想将学校分数与每个地点的学校类型的平均值进行比较。

我可以用 groupby 做到这一点: df.groupby(['type', 'location']).mean().round(2)

Out: 

                avg_score
type location   
a   county a    8
    county b    6
b   county a    8
    county b    6

但是,我想获得一个附加列,其中包含每个位置的该学校类型的平均值,而不是分组表。

我如何获得这样的 compare_score:

    name       type location avg_score compare_score
0   school a    a   county a    9       8
1   school b    a   county a    7       5
2   school c    b   county b    5       7
3   school d    b   county b    7       7
4   school e    a   county b    6       3
5   school f    b   county a    8       7

我发现了这个问题

Python Pandas average based on condition into new column

并尝试对我的问题应用一些可能的解决方案:

for atype, alocation in df.groupby('type'):
    df.loc[df.type == type, 'compare'] = (df.where(df['type' == atype]).where(df['location' == alocation]).mean()).avg_score.round(2)```

引发此错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.local/share/virtualenvs/schule-jwiURUl3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2656             try:
-> 2657                 return self._engine.get_loc(key)
   2658             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-27-4b3cf2b7aaf6> in <module>
      1 for atype, alocation in df.groupby('type'):
----> 2     df.loc[df.type == type, 'compare'] = (df.where(df['type' == atype]).where(df['location' == alocation]).mean()).avg_score.round(2)

~/.local/share/virtualenvs/schule-jwiURUl3/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2925             if self.columns.nlevels > 1:
   2926                 return self._getitem_multilevel(key)
-> 2927             indexer = self.columns.get_loc(key)
   2928             if is_integer(indexer):
   2929                 indexer = [indexer]

~/.local/share/virtualenvs/schule-jwiURUl3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2657                 return self._engine.get_loc(key)
   2658             except KeyError:
-> 2659                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2660         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2661         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

也许这根本不是一个好的尝试。你有什么建议吗? 任何提示都非常感谢。

【问题讨论】:

  • 如何获得比较分数?
  • 更新了最后一个问题以指出我的意图。谢谢@ansev

标签: python-3.x pandas multiple-columns pandas-groupby mean


【解决方案1】:

我想你在找transform:

df['compare_score']=df.groupby(['type', 'location'])['avg_score'].transform('mean').round(2)
print(df)

---------------

       name type  location  avg_score  compare_score
0  school a    a  county a          9              8
1  school b    a  county a          7              8
2  school c    b  county b          5              6
3  school d    b  county b          7              6
4  school e    a  county b          6              6
5  school f    b  county a          8              8

【讨论】:

  • 不错,没看过。更好的文档链接:pandas.pydata.org/pandas-docs/stable/reference/api/…
  • 我不清楚确切的参考。如果你确定我改变了参考:)
  • 请记住,转换并未在此处应用于数据帧
  • 公平点。也就是说,您链接到的文档是空的,所以......不清楚这些文档会有什么帮助。我之前没有使用过transform,所以我不确定df.transformdf.groupby.transform 会有什么不同。
  • 将帖子中的网址编辑为pandas.Series.transform的稳定文档页面
猜你喜欢
  • 1970-01-01
  • 2014-11-04
  • 1970-01-01
  • 1970-01-01
  • 2021-06-07
  • 1970-01-01
  • 2021-09-27
  • 1970-01-01
  • 2018-11-03
相关资源
最近更新 更多