【问题标题】:How to do waffle charts in python? (square piechart)如何在python中制作华夫饼图? (方形饼图)
【发布时间】:2017-05-14 23:38:21
【问题描述】:

类似这样的:

有一个很好的包to do it in R。在python中,我能想到的最好的就是这个,使用squarify包(灵感来自a post on how to do treemaps):

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns # just to have better line color and width
import squarify
# for those using jupyter notebooks
%matplotlib inline 


df = pd.DataFrame({
                  'v1': np.ones(100), 
                  'v2': np.random.randint(1, 4, 100)})
df.sort_values(by='v2', inplace=True)

# color scale
cmap = mpl.cm.Accent
mini, maxi = df['v2'].min(), df['v2'].max()
norm = mpl.colors.Normalize(vmin=mini, vmax=maxi)
colors = [cmap(norm(value)) for value in df['v2']]

# figure
fig = plt.figure()
ax = fig.add_subplot(111, aspect="equal")
ax = squarify.plot(df['v1'], color=colors, ax=ax)
ax.set_xticks([])
ax.set_yticks([]);

但是当我创建的不是 100 个而是 200 个元素(或其他非正方形数字)时,正方形会变得不对齐。

另一个问题是,如果我将 v2 更改为某个分类变量(例如,一百个 As、Bs、Cs 和 Ds),我会收到此错误:

无法将字符串转换为浮点数:'a'

那么,谁能帮我解决这两个问题:

  • 如何解决非平方观测值的对齐问题?
  • 如何在 v2 中使用分类变量?

除此之外,如果有任何其他 python 包可以更有效地创建华夫饼图,我真的很开放。

【问题讨论】:

  • 谢谢@not_a_robot,这周我会试试散景。
  • 200 不是平方数
  • 真的,谢谢@JaredGoguen。我编辑了我的问题,询问如何处理非平方数字。

标签: python matplotlib seaborn bokeh waffle-chart


【解决方案1】:

我花了几天时间来构建一个更通用的解决方案,PyWaffle。

可以通过安装

pip install pywaffle

源码:https://github.com/gyli/PyWaffle

PyWaffle 不使用 matshow() 方法,而是一个一个地构建这些正方形。这使得定制更容易。此外,它提供的是一个自定义的 Figure 类,它返回一个图形对象。通过更新图形的属性,基本可以控制图表中的一切。

一些例子:

彩色或透明背景:

import matplotlib.pyplot as plt
from pywaffle import Waffle

data = {'Democratic': 48, 'Republican': 46, 'Libertarian': 3}
fig = plt.figure(
    FigureClass=Waffle, 
    rows=5, 
    values=data, 
    colors=("#983D3D", "#232066", "#DCB732"),
    title={'label': 'Vote Percentage in 2016 US Presidential Election', 'loc': 'left'},
    labels=["{0} ({1}%)".format(k, v) for k, v in data.items()],
    legend={'loc': 'lower left', 'bbox_to_anchor': (0, -0.4), 'ncol': len(data), 'framealpha': 0}
)
fig.gca().set_facecolor('#EEEEEE')
fig.set_facecolor('#EEEEEE')
plt.show()

用图标代替方块:

data = {'Democratic': 48, 'Republican': 46, 'Libertarian': 3}
fig = plt.figure(
    FigureClass=Waffle, 
    rows=5, 
    values=data, 
    colors=("#232066", "#983D3D", "#DCB732"),
    legend={'loc': 'upper left', 'bbox_to_anchor': (1, 1)},
    icons='child', icon_size=18, 
    icon_legend=True
)

一张图表中有多个子图:

import pandas as pd
data = pd.DataFrame(
    {
        'labels': ['Hillary Clinton', 'Donald Trump', 'Others'],
        'Virginia': [1981473, 1769443, 233715],
        'Maryland': [1677928, 943169, 160349],
        'West Virginia': [188794, 489371, 36258],
    },
).set_index('labels')

fig = plt.figure(
    FigureClass=Waffle,
    plots={
        '311': {
            'values': data['Virginia'] / 30000,
            'labels': ["{0} ({1})".format(n, v) for n, v in data['Virginia'].items()],
            'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 8},
            'title': {'label': '2016 Virginia Presidential Election Results', 'loc': 'left'}
        },
        '312': {
            'values': data['Maryland'] / 30000,
            'labels': ["{0} ({1})".format(n, v) for n, v in data['Maryland'].items()],
            'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.2, 1), 'fontsize': 8},
            'title': {'label': '2016 Maryland Presidential Election Results', 'loc': 'left'}
        },
        '313': {
            'values': data['West Virginia'] / 30000,
            'labels': ["{0} ({1})".format(n, v) for n, v in data['West Virginia'].items()],
            'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.3, 1), 'fontsize': 8},
            'title': {'label': '2016 West Virginia Presidential Election Results', 'loc': 'left'}
        },
    },
    rows=5,
    colors=("#2196f3", "#ff5252", "#999999"),  # Default argument values for subplots
    figsize=(9, 5)  # figsize is a parameter of plt.figure
)

【讨论】:

  • 太棒了!我们可以使用此模块为特定类别的不同单元格(例如,您在 Git 配置文件概览中看到的内容)设置不同的 alpha 值或更深的颜色阴影吗?
  • 谢谢。我如何更改图例/颜色的顺序。做别人,唐纳德,希拉里。我的情节与听写顺序不符
  • @user147529 图例和颜色遵循数据中的索引顺序。当说情节与字典顺序不匹配时,我假设您正在谈论用于创建 Dataframe 的字典。检查您是否在 DF 中有正确的数据,或者使用列表创建 DF,例如 pd.DataFrame([[1981473, 1677928, 188794],[1769443, 943169, 489371],[233715, 160349, 36258]], index=[ '希拉里克林顿','唐纳德特朗普','其他'],列= ['弗吉尼亚','马里兰','西弗吉尼亚'])
【解决方案2】:

我在下面整理了一个工作示例,我认为它可以满足您的需求。需要做一些工作来全面推广该方法,但我认为您会发现这是一个好的开始。诀窍是使用matshow() 解决您的非平方问题,并构建自定义图例以轻松解释分类值。

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

# Let's make a default data frame with catagories and values.
df = pd.DataFrame({ 'catagories': ['cat1', 'cat2', 'cat3', 'cat4'], 
                    'values': [84911, 14414, 10062, 8565] })
# Now, we define a desired height and width.
waffle_plot_width = 20
waffle_plot_height = 7

classes = df['catagories']
values = df['values']

def waffle_plot(classes, values, height, width, colormap):

    # Compute the portion of the total assigned to each class.
    class_portion = [float(v)/sum(values) for v in values]

    # Compute the number of tiles for each catagories.
    total_tiles = width * height
    tiles_per_class = [round(p*total_tiles) for p in class_portion]

    # Make a dummy matrix for use in plotting.
    plot_matrix = np.zeros((height, width))

    # Popoulate the dummy matrix with integer values.
    class_index = 0
    tile_index = 0

    # Iterate over each tile.
    for col in range(waffle_plot_width):
        for row in range(height):
            tile_index += 1

            # If the number of tiles populated is sufficient for this class...
            if tile_index > sum(tiles_per_class[0:class_index]):

                # ...increment to the next class.
                class_index += 1       

            # Set the class value to an integer, which increases with class.
            plot_matrix[row, col] = class_index

    # Create a new figure.
    fig = plt.figure()

    # Using matshow solves your "non-square" problem. 
    plt.matshow(plot_matrix, cmap=colormap)
    plt.colorbar()

    # Get the axis.
    ax = plt.gca()

    # Minor ticks
    ax.set_xticks(np.arange(-.5, (width), 1), minor=True);
    ax.set_yticks(np.arange(-.5, (height), 1), minor=True);

    # Gridlines based on minor ticks
    ax.grid(which='minor', color='w', linestyle='-', linewidth=2)

    # Manually constructing a legend solves your "catagorical" problem.
    legend_handles = []
    for i, c in enumerate(classes):
        lable_str = c + " (" + str(values[i]) + ")"
        color_val = colormap(float(i+1)/len(classes))
        legend_handles.append(mpatches.Patch(color=color_val, label=lable_str))

    # Add the legend. Still a bit of work to do here, to perfect centering.
    plt.legend(handles=legend_handles, loc=1, ncol=len(classes),
               bbox_to_anchor=(0., -0.1, 0.95, .10))

    plt.xticks([])
    plt.yticks([])

# Call the plotting function.
waffle_plot(classes, values, waffle_plot_height, waffle_plot_width,
            plt.cm.coolwarm)

以下是此脚本生成的输出示例。如您所见,它对我来说效果很好,并且可以满足您提出的所有需求。如果它给你带来任何麻烦,请告诉我。享受吧!

【讨论】:

    【解决方案3】:

    您可以使用此功能自动创建带有简单参数的华夫饼:

    def create_waffle_chart(categories, values, height, width, colormap, value_sign=''):
    
        # compute the proportion of each category with respect to the total
        total_values = sum(values)
        category_proportions = [(float(value) / total_values) for value in values]
    
        # compute the total number of tiles
        total_num_tiles = width * height # total number of tiles
        print ('Total number of tiles is', total_num_tiles)
    
        # compute the number of tiles for each catagory
        tiles_per_category = [round(proportion * total_num_tiles) for proportion in category_proportions]
    
        # print out number of tiles per category
        for i, tiles in enumerate(tiles_per_category):
            print (df_dsn.index.values[i] + ': ' + str(tiles))
    
        # initialize the waffle chart as an empty matrix
        waffle_chart = np.zeros((height, width))
    
        # define indices to loop through waffle chart
        category_index = 0
        tile_index = 0
    
        # populate the waffle chart
        for col in range(width):
            for row in range(height):
                tile_index += 1
    
                # if the number of tiles populated for the current category 
                # is equal to its corresponding allocated tiles...
                if tile_index > sum(tiles_per_category[0:category_index]):
                    # ...proceed to the next category
                    category_index += 1       
    
                # set the class value to an integer, which increases with class
                waffle_chart[row, col] = category_index
    
        # instantiate a new figure object
        fig = plt.figure()
    
        # use matshow to display the waffle chart
        colormap = plt.cm.coolwarm
        plt.matshow(waffle_chart, cmap=colormap)
        plt.colorbar()
    
        # get the axis
        ax = plt.gca()
    
        # set minor ticks
        ax.set_xticks(np.arange(-.5, (width), 1), minor=True)
        ax.set_yticks(np.arange(-.5, (height), 1), minor=True)
    
        # add dridlines based on minor ticks
        ax.grid(which='minor', color='w', linestyle='-', linewidth=2)
    
        plt.xticks([])
        plt.yticks([])
    
        # compute cumulative sum of individual categories to match color schemes between chart and legend
        values_cumsum = np.cumsum(values)
        total_values = values_cumsum[len(values_cumsum) - 1]
    
        # create legend
        legend_handles = []
        for i, category in enumerate(categories):
            if value_sign == '%':
                label_str = category + ' (' + str(values[i]) + value_sign + ')'
            else:
                label_str = category + ' (' + value_sign + str(values[i]) + ')'
    
            color_val = colormap(float(values_cumsum[i])/total_values)
            legend_handles.append(mpatches.Patch(color=color_val, label=label_str))
    
        # add legend to chart
        plt.legend(
            handles=legend_handles,
            loc='lower center', 
            ncol=len(categories),
            bbox_to_anchor=(0., -0.2, 0.95, .1)
        )
    

    【讨论】:

    • in create_waffle_chart(categories, values, height, width, colormap, value_sign) 14 # 打印出每个类别的瓷砖数量 15 for i, tiles in enumerate(tiles_per_category ): ---> 16 print (df_dsn.index.values[i] + ': ' + str(tiles)) 17 18 # 将华夫饼图初始化为空矩阵 NameError: name 'df_dsn' is not defined
    猜你喜欢
    • 1970-01-01
    • 2022-07-11
    • 2017-08-05
    • 1970-01-01
    • 1970-01-01
    • 2018-10-04
    • 2018-06-08
    相关资源
    最近更新 更多