【问题标题】:Join unique values in a column based on intersection of other columns in pandas根据熊猫中其他列的交集加入列中的唯一值
【发布时间】:2023-01-03 14:21:47
【问题描述】:

可以说我有以下数据框:

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar","bar"],
                   "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two","two"],
                   "C": ["small", "large", "large", "small",
                         "small", "large", "small", "small",
                         "large", "large"],
                   "D": [1, 2, 3, 4, 5, 6, 7, 8, 9,99999]})

如果 "A"、"B" 和 "C" 中的值存在交集,我想在 "D" 列中加入(连接?或合并?)值。通过交集,我的意思是我想要这个 DataFrame:

    A   B   C   D
0   foo one small   1
1   foo one large   2,3
2   foo two small   4,5
3   bar one large   6
4   bar one small   7
5   bar two small   8
6   bar two large   9,99999

有最小、最大、总和等聚合函数,但我根本想不出解决方案。

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    采用:

    df1 = (df.assign(D = df.D.astype(str))
            .groupby(['A','B','C'], sort=False)['D']
            .agg(','.join)
            .reset_index())
    print (df1)
         A    B      C        D
    0  foo  one  small        1
    1  foo  one  large      2,3
    2  foo  two  small      4,5
    3  bar  one  large        6
    4  bar  one  small        7
    5  bar  two  small        8
    6  bar  two  large  9,99999
    

    要么:

    df1 = (df.groupby(['A','B','C'], sort=False)['D']
            .agg(lambda x: ','.join(x.astype(str)))
            .reset_index())
    print (df1)
         A    B      C        D
    0  foo  one  small        1
    1  foo  one  large      2,3
    2  foo  two  small      4,5
    3  bar  one  large        6
    4  bar  one  small        7
    5  bar  two  small        8
    6  bar  two  large  9,99999
    

    如果可能的话,每组 D 中的重复值:

    df2 = (df.groupby(['A','B','C'], sort=False)['D']
            .agg(lambda x: ','.join(x.astype(str).unique()))
            .reset_index())
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-04-26
      • 2020-11-12
      • 1970-01-01
      • 1970-01-01
      • 2020-07-12
      • 2015-10-10
      相关资源
      最近更新 更多