【问题标题】:equivalent of ave in pandas相当于熊猫中的 ave
【发布时间】:2015-03-05 22:08:17
【问题描述】:

我的帖子类似于另一个 SO 帖子:equivalent-of-r-function-ave-in-python-pandas,但我收到了错误消息。

假设:

我有一个数据框df

     A      B  C    D
0  foo    one -2.0  0.5
1  bar    one -1.5 -1.5
2  foo    two -0.5 -0.8
3  bar  three -0.0  0.7
4  foo    two -1.5  0.9
5  bar    two  1.5  0.6
6  foo    one -0.0 -0.4
7  foo  three  0.5  1.8

我想创建另一列E,其中包含c' each group when grouped by sayA`列中的值的mean

     A      B  C    D    E
0  foo    one -2.0  0.5  -0.7
1  bar    one -1.5 -1.5   0.0
2  foo    two -0.5 -0.8  -0.7
3  bar  three -0.0  0.7   0.0
4  foo    two -1.5  0.9  -0.7
5  bar    two  1.5  0.6   0.0
6  foo    one -0.0 -0.4  -0.7
7  foo  three  0.5  1.8  -0.7

我尝试了这个例子,比如

df['E'] = df.groupby('A').transform(lambda x: pandas.Series(x.C.mean()))

df['E'] = df.groupby('A').transform(lambda x: pandas.Series(x['C'].mean()))

但我得到了ValueError: Wrong number of items passed 3, placement implies 1

这是完整的错误信息集:

Traceback (most recent call last):
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 2978, in set
    loc = self.items.get_loc(item)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\index.py", line 1402, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3807)
  File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3687)
  File "pandas\hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12310)
  File "pandas\hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12261)
KeyError: 'E'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-85-36e1c884837f>", line 1, in <module>
    df['E']=df.groupby('A').transform(lambda x: pandas.Series(x.C.max()))
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\frame.py", line 2110, in __setitem__
    self._set_item(key, value)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\frame.py", line 2188, in _set_item
    NDFrame._set_item(self, key, value)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\generic.py", line 1179, in _set_item
    self._data.set(key, value)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 2981, in set
    self.insert(len(self.items), item, value)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 3080, in insert
    placement=slice(loc, loc+1))
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 2099, in make_block
    placement=placement)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 1427, in __init__
placement=placement)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 76, in __init__
    len(self.values), len(self.mgr_locs)))
ValueError: Wrong number of items passed 3, placement implies 1

我做错了什么?

我正在使用 Python 3.4.2.4 和 Pandas 版本 0.15.2

【问题讨论】:

    标签: python python-3.x pandas


    【解决方案1】:

    我觉得transform是正确的做法,但是需要直接抢栏目:

    >>> df["E"] = df.groupby("A")["C"].transform("mean")
    >>> df
         A      B    C    D    E
    0  foo    one -2.0  0.5 -0.7
    1  bar    one -1.5 -1.5  0.0
    2  foo    two -0.5 -0.8 -0.7
    3  bar  three -0.0  0.7  0.0
    4  foo    two -1.5  0.9 -0.7
    5  bar    two  1.5  0.6  0.0
    6  foo    one -0.0 -0.4 -0.7
    7  foo  three  0.5  1.8 -0.7
    

    这与通常获取分组列的方法基本相同:

    >>> df.groupby("A")["C"].mean()
    A
    bar    0.0
    foo   -0.7
    Name: C, dtype: float64
    

    transform 会将结果广播回各个组。

    【讨论】:

    • 还有一个问题:transform使用的函数是什么,不是mean这样的标准函数,而是自定义函数myfun。我需要在引号内使用myfun 吗?还是说lambda函数?
    • 另外,当transform函数引用mean函数时,是否使用pandas' mean函数?
    • @uday: 是的,如果你传递一个像"mean" 这样的字符串,它指的是一个内置的pandas 函数。如果你传递你自己的,你应该传递函数本身,而不是它的名字。
    猜你喜欢
    • 1970-01-01
    • 2016-11-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-02-02
    • 2016-08-26
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多