【问题标题】:Applying function to column in grouped pandas dataframe and returning output as a new column将函数应用于 groupby pandas 数据框中的列并将输出作为新列返回
【发布时间】:2017-07-29 00:30:11
【问题描述】:

我有一些由多列组成的天气数据集:

StationID、海拔、日期时间、经度、纬度、降雨量

我有多个站点,它们由各自的 ID 标识。降雨量列已累积降雨量。例如,对于 10 天内的 X 站,我可以(以毫米/天为单位):

X 站,0 0 0 1 5 6 6 8 8 15

对于 Y 站,我可以有

*Y站,0 1 14 14 14 15 18 18 18 20

但我需要的是强度值,即一天减去另一天的量。这将为我提供 X 和 Y 站的以下值(第一个值以 0 开头),

X 站,0 0 0 1 4 1 0 2 0 7

Y 站,0 1 13 0 0 1 3 0 0 2

我创建了一个函数,它接受一个时间序列并计算这个差异:

def intensity(ts):
    ts2 = [0]
    for i in range(0,len(ts[:-1])):
        ts2.append((ts[i+1]-ts[i]))
    return ts2

test = [1,2,3,4,5,10,10,10,20,25]
intensity(test)

现在,我的问题是:如何将此函数应用于每个站组的数据框中的“降雨”列,即:

dfg = df.groupby('station')

然后将输出分配给数据框中的新列(例如:“rain_intensity”列)?

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    我认为你需要:

    print (df.groupby('station')['rainfall'].apply(intensity))
    

    但更好的是diffNaN 替换为0fillna,然后在必要时转换为int

    print (df.groupby('StationID')['rainfall'].diff().fillna(0))
    

    示例:

    df = pd.DataFrame({'rainfall': [0, 0, 0 ,1, 5, 6, 6, 8, 8, 15, 0, 1, 14, 14, 14, 15, 18, 18, 18, 20],
    'StationID': ['station X'] * 10 + ['station Y'] * 10})
    
    print (df)
        StationID  rainfall
    0   station X         0
    1   station X         0
    2   station X         0
    3   station X         1
    4   station X         5
    5   station X         6
    6   station X         6
    7   station X         8
    8   station X         8
    9   station X        15
    10  station Y         0
    11  station Y         1
    12  station Y        14
    13  station Y        14
    14  station Y        14
    15  station Y        15
    16  station Y        18
    17  station Y        18
    18  station Y        18
    19  station Y        20
    
    def intensity(ts):
        ts = ts.tolist()
        ts2 = [0]
        for i in range(0,len(ts[:-1])):
            ts2.append((ts[i+1]-ts[i]))
        return pd.Series(ts2)
    
    df['diff1'] = df.groupby('StationID')['rainfall'].apply(intensity).reset_index(drop=True)
    df['diff2'] = df.groupby('StationID')['rainfall'].diff().fillna(0).astype(int)
    
    print (df)
        StationID  rainfall  diff1  diff2
    0   station X         0      0      0
    1   station X         0      0      0
    2   station X         0      0      0
    3   station X         1      1      1
    4   station X         5      4      4
    5   station X         6      1      1
    6   station X         6      0      0
    7   station X         8      2      2
    8   station X         8      0      0
    9   station X        15      7      7
    10  station Y         0      0      0
    11  station Y         1      1      1
    12  station Y        14     13     13
    13  station Y        14      0      0
    14  station Y        14      0      0
    15  station Y        15      1      1
    16  station Y        18      3      3
    17  station Y        18      0      0
    18  station Y        18      0      0
    19  station Y        20      2      2
    

    【讨论】:

    • 这太棒了!太感谢了!我使用了diff() 选项。
    猜你喜欢
    • 2016-09-21
    • 2019-02-16
    • 1970-01-01
    • 1970-01-01
    • 2020-12-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多