【发布时间】:2017-11-09 15:22:05
【问题描述】:
我有一个网站流量数据集,一个月内大约 2000 个网站,按流量来源的设备类型制成表格:
In [12]: df.sample(10)
Out[12]:
date device nb_uniq_visitors site_id
11 2017-10-31 Tv 0.0 3331.0
6 2017-10-22 Car browser 0.0 503.0
7 2017-10-22 Camera 0.0 3259.0
7 2017-10-08 Car browser 0.0 630.0
3 2017-10-23 Camera 0.0 118.0
0 2017-10-12 Desktop 1.0 4769.0
11 2017-10-31 Tv 0.0 361.0
5 2017-10-12 Phablet 0.0 2999.0
9 2017-10-17 Portable media player 0.0 1725.0
0 2017-10-13 Desktop 2410.0 1004.0
4 2017-10-13 all 900.0 1271.0
注意device 列的all 类别代表所有设备的总数,因此它可以作为百分比计算的分母。
我想查看每个网站的设备类型百分比,我想象输出看起来像这样(我手动计算了下面的例子):
nb_uniq_visitors
site_id device
74.0 Camera 0.00
Car browser 0.00
Console 0.00
Desktop 0.56
Feature phone 0.00
Phablet 0.01
Portable media player 0.00
Smart display 0.00
Smartphone 0.37
Tablet 0.05
Tv 0.00
Unknown 0.00
all 1.00
96.0 Camera 0.00
Car browser 0.00
Console 0.00
Desktop 0.64
Feature phone 0.00
Phablet 0.01
Portable media player 0.00
Smart display 0.00
Smartphone 0.29
Tablet 0.06
Tv 0.00
Unknown 0.01
all 1.00
我使用groupby 将site_id 和device 分组:
In [23]: sl = df.groupby(['site_id', 'device']).sum()
In [24]: sl.head(25)
Out[24]:
nb_uniq_visitors
site_id device
74.0 Camera 0.0
Car browser 0.0
Console 1.0
Desktop 10534.0
Feature phone 0.0
Phablet 178.0
Portable media player 4.0
Smart display 0.0
Smartphone 6955.0
Tablet 1022.0
Tv 1.0
Unknown 62.0
all 18757.0
96.0 Camera 0.0
Car browser 2.0
Console 6.0
Desktop 118157.0
Feature phone 0.0
Phablet 1061.0
Portable media player 73.0
Smart display 0.0
Smartphone 53292.0
Tablet 11060.0
Tv 2.0
Unknown 1717.0
all 185370.0
如何将以上内容从汇总值转换为百分比?还是完全有更好的方法?
【问题讨论】:
-
逻辑如下: 1. 得到
nb_uniq_visitors的总和 2. 在nb_uniq_visitors列的数据帧的每一行上应用一个lambda 除以总数。例如,df['nb_uniq_visitors'] = df['nb_uniq_visitors'].apply(lambda row: row/sum(df['nb_uniq_visitors']))
标签: pandas multi-index pandas-groupby