编辑:更改为在输出中具有列名访问权限
(请注意在 [1] 中添加了 as_index=False 和 .reset_index,参见来源 5 和 6
[1] 在CVE_ID 列上的第一个groupby 并使用size:
counts = myframe.groupby(['CVE_ID','VulnName','ServerOwner'], as_index=False).size().unstack(fill_value=0).reset_index()
ServerOwner CVE_ID VulnName Alice Bob Carol
0 CVE-2017-1111 Java Update 1 1 1 0
1 CVE-2017-1112 Java Update 2 1 1 0
2 CVE-2017-1113 Java Update 3 1 1 1
3 CVE-2017-1114 Adobe 1 1 1 1
4 CVE-2017-1115 Chrome 1 1 0 1
5 CVE-2017-1116 Chrome 2 0 0 1
6 CVE-2017-1117 Chrome 3 0 0 1
[2] 然后对 Alice、Bob 和 Carol 列求和得到:
counts['Count'] = counts[['Alice','Bob','Carol']].sum(axis=1)
ServerOwner CVE_ID VulnName Alice Bob Carol Count
0 CVE-2017-1111 Java Update 1 1 1 0 2
1 CVE-2017-1112 Java Update 2 1 1 0 2
2 CVE-2017-1113 Java Update 3 1 1 1 3
3 CVE-2017-1114 Adobe 1 1 1 1 3
4 CVE-2017-1115 Chrome 1 1 0 1 2
5 CVE-2017-1116 Chrome 2 0 0 1 1
6 CVE-2017-1117 Chrome 3 0 0 1 1
[3] 然后在names上使用df.drop删除名称列:
counts.drop(['Carol','Bob','Alice'],inplace=True,axis=1)
ServerOwner CVE_ID VulnName Count
0 CVE-2017-1111 Java Update 1 2
1 CVE-2017-1112 Java Update 2 2
2 CVE-2017-1113 Java Update 3 3
3 CVE-2017-1114 Adobe 1 3
4 CVE-2017-1115 Chrome 1 2
5 CVE-2017-1116 Chrome 2 1
6 CVE-2017-1117 Chrome 3 1
[4] 然后在sum 列上使用sort_values:
counts.sort_values(by='Count', ascending=False, inplace=True)
ServerOwner CVE_ID VulnName Count
2 CVE-2017-1113 Java Update 3 3
3 CVE-2017-1114 Adobe 1 3
0 CVE-2017-1111 Java Update 1 2
1 CVE-2017-1112 Java Update 2 2
4 CVE-2017-1115 Chrome 1 2
5 CVE-2017-1116 Chrome 2 1
6 CVE-2017-1117 Chrome 3 1
综合:
counts = myframe.groupby(['CVE_ID','VulnName','ServerOwner'], as_index=False).size().unstack(fill_value=0).reset_index()
counts['Count'] = counts[['Alice','Bob','Carol']].sum(axis=1)
counts.drop(['Carol','Bob','Alice'],inplace=True,axis=1)
counts.sort_values(by='Count', ascending=False, inplace=True)
print "The dataframe: \n", myframe
print "Top 10 offending CVEs, Vulnerability and Count: \n"
print counts
Top 10 offending CVEs, Vulnerability and Count:
ServerOwner CVE_ID VulnName Count
2 CVE-2017-1113 Java Update 3 3
3 CVE-2017-1114 Adobe 1 3
0 CVE-2017-1111 Java Update 1 2
1 CVE-2017-1112 Java Update 2 2
4 CVE-2017-1115 Chrome 1 2
5 CVE-2017-1116 Chrome 2 1
6 CVE-2017-1117 Chrome 3 1
如果需要,此时可以使用reset_index() 重置索引。
编辑:针对serverOwner索引的评论,您可以重置索引、删除旧索引和重命名新索引:
counts.reset_index(drop=True, inplace = True)
counts.index.names = ['index']
给予:
ServerOwner CVE_ID VulnName Count
index
0 CVE-2017-1113 Java Update 3 3
1 CVE-2017-1114 Adobe 1 3
2 CVE-2017-1111 Java Update 1 2
3 CVE-2017-1112 Java Update 2 2
4 CVE-2017-1115 Chrome 1 2
5 CVE-2017-1116 Chrome 2 1
6 CVE-2017-1117 Chrome 3 1
(ServerOwner 名称保留为原始 groupby 命令的残余,以详细说明使用了哪一列。)
此答案的来源:
[1]
Groupby value counts on the dataframe pandas
数据框-熊猫
[2]
Pandas: sum DataFrame rows for given columns
[3]
Delete column from pandas DataFrame
[4]
python, sort descending dataframe with pandas
[5]
Converting a Pandas GroupBy object to DataFrame
[6]
How to GroupBy a Dataframe in Pandas and keep Columns