你可以使用to_frame()方法:
In [10]: df.groupby('a').b.sum().to_frame('v').query('v > 3').query('v % 3 == 1')
Out[10]:
v
a
1 7
如果您需要将结果作为系列:
In [12]: df.groupby('a').b.sum().to_frame('v').query('v > 3').query('v % 3 == 1').v
Out[12]:
a
1 7
Name: v, dtype: int64
to_frame() 是否涉及复制系列?
涉及到DataFrame构造函数的调用:
https://github.com/pydata/pandas/blob/master/pandas/core/series.py#L1140:
df = self._constructor_expanddim({name: self})
https://github.com/pydata/pandas/blob/master/pandas/core/series.py#L265:
def _constructor_expanddim(self):
from pandas.core.frame import DataFrame
return DataFrame
性能影响(针对 600K 行 DF 进行测试):
In [66]: %timeit df.groupby('a').b.sum()
10 loops, best of 3: 46.2 ms per loop
In [67]: %timeit df.groupby('a').b.sum().to_frame('v')
10 loops, best of 3: 49.7 ms per loop
In [68]: 49.7 / 46.2
Out[68]: 1.0757575757575757
性能影响(针对 6M 行 DF 进行测试):
In [69]: df = pd.concat([df] * 10, ignore_index=True)
In [70]: df.shape
Out[70]: (6000000, 2)
In [71]: %timeit df.groupby('a').b.sum()
1 loop, best of 3: 474 ms per loop
In [72]: %timeit df.groupby('a').b.sum().to_frame('v')
1 loop, best of 3: 464 ms per loop
性能影响(针对 60M 行 DF 进行测试):
In [73]: df = pd.concat([df] * 10, ignore_index=True)
In [74]: df.shape
Out[74]: (60000000, 2)
In [75]: %timeit df.groupby('a').b.sum()
1 loop, best of 3: 4.28 s per loop
In [76]: %timeit df.groupby('a').b.sum().to_frame('v')
1 loop, best of 3: 4.3 s per loop
In [77]: 4.3 / 4.28
Out[77]: 1.0046728971962615
结论:性能影响似乎没有那么大...