【问题标题】:Pandas pivot_table() aggfunc aggregation conditional on multiple columns?Pandas pivot_table() aggfunc 聚合以多列为条件?
【发布时间】:2021-03-03 18:58:16
【问题描述】:

我想用 pandas 数据透视表聚合一列,但自定义聚合应该以数据框中的不同列为条件。

请参见下面的示例:假设如果“Number_mentions”的值高于阈值,我想对“Newspaper”列中的每个值的“Number_mentions”列求和。使用自定义 aggfunc 很容易做到这一点。 但是,如果我只想对那些与“国家”列中的值“RU”不在同一行的“Number_mentions”求和怎么办?似乎 aggfunc 只能将一列与其他列隔离开来,我不知道如何将整个数据框放入 aggfunc 以在 aggfunc 中进行条件子集。

df = pd.DataFrame({"Number_mentions": [1,5,2,3,6,5], 
                   "Newspaper": ["Newspaper1", "Newspaper1", "Newspaper2", "Newspaper3", "Newspaper4", "Newspaper5"], 
                   "Country": ["US", "US", "CN", "CN", "RU", "RU"]})

def articles_above_thresh_with_condition(input_series, thresh=2):
    series_bool = input_series > thresh
    # ! add some if condition based on additional column in df: 
    # ! only aggregate those values where column "Country" is not "RU". 
    # ? code ? 
    n_articles_above_thresh = sum(series_bool)
    return n_articles_above_thresh

df_piv = pd.pivot_table(df, values=["Number_mentions"],
                        index="Newspaper", columns=None, margins=False,
                        aggfunc=articles_above_thresh_with_condition)

【问题讨论】:

    标签: python python-3.x pandas dataframe pivot-table


    【解决方案1】:

    您需要不同的方法,因为 pivot_table 不能使用 2 列。

    所以先用Series.where将不匹配的值替换为缺失值,然后处理这个新列:

    df["Number_mentions1"] = df["Number_mentions"].where(df["Country"].ne('RU'))
    print (df)
       Number_mentions   Newspaper Country  Number_mentions1
    0                1  Newspaper1      US               1.0
    1                5  Newspaper1      US               5.0
    2                2  Newspaper2      CN               2.0
    3                3  Newspaper3      CN               3.0
    4                6  Newspaper4      RU               NaN
    5                5  Newspaper5      RU               NaN
    
    df_piv = pd.pivot_table(df, values=["Number_mentions1"],
                            index="Newspaper", columns=None, margins=False,
                            aggfunc=articles_above_thresh_with_condition)
    print (df_piv)
                Number_mentions1
    Newspaper                   
    Newspaper1               1.0
    Newspaper2               0.0
    Newspaper3               1.0
    Newspaper4               0.0
    Newspaper5               0.0
    

    【讨论】:

    • 谢谢,我目前也在数据透视表之外做子集,效果很好(我用 .groupby() 来做我的用例)。我只是希望子集可以在数据透视表内完成。如果您不能在函数中处理整个表,而只能处理单个列,我觉得它会降低 pivot_table() 的有用性:/
    猜你喜欢
    • 2018-02-04
    • 2021-08-24
    • 1970-01-01
    • 1970-01-01
    • 2018-10-31
    • 1970-01-01
    • 1970-01-01
    • 2021-08-24
    • 2015-08-17
    相关资源
    最近更新 更多