【发布时间】:2021-04-28 10:24:57
【问题描述】:
数据集(MWE)
location date people_vaccinated people_fully_vaccinated people_vaccinated_per_hundred
AL 12-01-2021 70861 7270 1.45
AL 13-01-2021 74792 9245 1.53
AL 14-01-2021 80480 11366 1.64
AL 15-01-2021 86956 13488 1.77
AL 16-01-2021 93797 14202 1.91
AL 17-01-2021 100638 14917 2.05
AS 22-01-2021 5627 940 10.1
AS 23-01-2021 5881 948 10.56
AS 24-01-2021 7096 948 12.74
AS 25-01-2021 7096 949 12.98
AS 26-01-2021 7230 950 13.23
AS 27-01-2021 8133 950 14.6
我正在尝试用 NaN 替换列 {people_vaccinated,people_fully_vaccinated,people_vaccinated_per_hundred} 中的重复项,同时在 location 上使用 groupby()。我在网上尝试了一些解决方案,但无法让它们为我工作,所以改为使用以下逻辑
def remove(df , a):
df['duplicate'] = df[a].shift(1)
df[a] = df.apply(lambda x: np.nan if x[a] == x['duplicate'] \
else x[a], axis=1)
df = df.drop('duplicate', axis=1)
return df
dfn = remove(dfn,'people_vaccinated')
dfn = remove(dfn,'people_fully_vaccinated')
dfn = remove(dfn,'people_vaccinated_per_hundred')
当您有连续的空值(超过 2 个)时,上述逻辑将失败。我需要用 NaN 替换重复项(同时保留第一个实例)。做这个的最好方式是什么?从上面的sn-p可以看出people_fully_vaccinated列有重复值
样本输出
location date people_vaccinated people_fully_vaccinated people_vaccinated_per_hundred
AL 12-01-2021 70861 7270 1.45
AL 13-01-2021 74792 9245 1.53
AL 14-01-2021 80480 11366 1.64
AL 15-01-2021 86956 13488 1.77
AL 16-01-2021 93797 14202 1.91
AL 17-01-2021 100638 14917 2.05
AS 22-01-2021 5627 940 10.1
AS 23-01-2021 5881 948 10.56
AS 24-01-2021 7096 NaN 12.74
AS 25-01-2021 NaN 949 12.98
AS 26-01-2021 7230 950 13.23
AS 27-01-2021 8133 NaN 14.6
【问题讨论】:
标签: python pandas numpy pandas-groupby