【发布时间】:2020-09-05 10:39:27
【问题描述】:
如何规范化具有多索引的 pandas 交叉表?
假设你有这样的df:
# RANDOM DATA
np.random.seed(2)
Year = [2020,2019,2018,2017]*1000
col1 = ['A','B','C','D']*1000
col2 = np.random.randint(0,2,4000)
weight = np.random.randint(1,3,4000)
random.shuffle(Year)
random.shuffle(col1)
random.shuffle(col2)
random.shuffle(weight)
column_names = ['Year', 'weight', 'col1', 'col2']
df = pd.DataFrame(columns=column_names)
df['Year'] = Year
df['col1'] = col1
df['col2'] = col2
df['weight'] = weight
现在你做一个交叉表:
pd.crosstab(index=[df['Year']],
columns=[df['col1'], df['col2']],
values=df['weight'],
aggfunc=sum)
col1 A B C D \
col2 0 1 0 1 0 1 0
Year
2017 0.117962 0.128686 0.128016 0.130697 0.137399 0.122654 0.115282
2018 0.116832 0.111551 0.120132 0.118152 0.138614 0.125413 0.131353
2019 0.137584 0.126846 0.127517 0.108725 0.114765 0.138255 0.114765
2020 0.116356 0.134309 0.113032 0.143617 0.121676 0.118351 0.121676
col1
col2 1
Year
2017 0.119303
2018 0.137954
2019 0.131544
2020 0.130984
如何在多索引中进行规范化?
我的预期输出是:
col1 A A B B C C D D
col2 0 1 0 1 0 1 0 1
Year
2017 0.478 0.522 0.495 0.505 0.528 0.472 0.491 0.509
2018 0.512 0.488 0.504 0.496 0.525 0.475 0.488 0.512
2019 0.520 0.480 0.540 0.460 0.454 0.546 0.466 0.534
2020 0.464 0.536 0.440 0.560 0.507 0.493 0.482 0.518
【问题讨论】:
标签: python-3.x pandas crosstab