【发布时间】:2021-03-03 18:58:16
【问题描述】:
我想用 pandas 数据透视表聚合一列,但自定义聚合应该以数据框中的不同列为条件。
请参见下面的示例:假设如果“Number_mentions”的值高于阈值,我想对“Newspaper”列中的每个值的“Number_mentions”列求和。使用自定义 aggfunc 很容易做到这一点。 但是,如果我只想对那些与“国家”列中的值“RU”不在同一行的“Number_mentions”求和怎么办?似乎 aggfunc 只能将一列与其他列隔离开来,我不知道如何将整个数据框放入 aggfunc 以在 aggfunc 中进行条件子集。
df = pd.DataFrame({"Number_mentions": [1,5,2,3,6,5],
"Newspaper": ["Newspaper1", "Newspaper1", "Newspaper2", "Newspaper3", "Newspaper4", "Newspaper5"],
"Country": ["US", "US", "CN", "CN", "RU", "RU"]})
def articles_above_thresh_with_condition(input_series, thresh=2):
series_bool = input_series > thresh
# ! add some if condition based on additional column in df:
# ! only aggregate those values where column "Country" is not "RU".
# ? code ?
n_articles_above_thresh = sum(series_bool)
return n_articles_above_thresh
df_piv = pd.pivot_table(df, values=["Number_mentions"],
index="Newspaper", columns=None, margins=False,
aggfunc=articles_above_thresh_with_condition)
【问题讨论】:
标签: python python-3.x pandas dataframe pivot-table