在 matplotlib 中创建发散堆积条形图答案

【问题标题】：Create a Diverging Stacked Bar Chart in matplotlib在 matplotlib 中创建发散堆积条形图
【发布时间】：2014-06-02 06:27:14
【问题描述】：

我有一些数据列表，这些数据表明对李克特问题的回答，范围从 1（非常不开心）到 5（非常开心）。我想创建一个绘图页面，将这些列表显示为倾斜的堆叠水平条形图。响应列表可以有不同的大小（例如，当有人选择不回答特定问题时）。这是数据的最小示例：

likert1 = [1.0, 2.0, 1.0, 2.0, 1.0, 3.0, 3.0, 4.0, 4.0, 1.0, 1.0]
likert2 = [5.0, 4.0, 5.0, 4.0, 5.0, 3.0]

我希望能够用类似的东西来绘制这个：

plot_many_likerts(likert1, likert2)

目前我已经编写了一个函数来迭代列表，并将每个列表作为自己的子图绘制在 matplotlib 中的共享图形上：

def plot_many_likerts(*lsts):
    #get the figure and the list of axes for this plot
    fig, axlst = plt.subplots(len(lsts), sharex=True)
    for i in range(len(lsts)):
        likert_horizontal_bar_list(lsts[i], axlst[i], xaxis=[1.0, 2.0, 3.0, 4.0, 5.0])
        axlst[i].axis('off')
    fig.show()

def likert_horizontal_bar_list(lst, ax, xaxis):
    cnt = Counter(lst)
    #del (cnt[None])
    i = 0
    colour_float = 0.00001
    previous_right = 0
    for key in sorted(xaxis):
        ax.barh(bottom=0, width=cnt[key], height=0.4, left=previous_right, color=plt.cm.jet(colour_float),label=str(key))
        i += 1
        previous_right = previous_right + cnt[key]
       colour_float = float(i) / float(len(xaxis))

这效果不错，并且可以创建具有相同代表性尺寸的堆叠条形图（例如，宽度共享共同的轴刻度）。这是一个屏幕截图：

What is currently Produced http://s7.postimg.org/vh0j816gn/figure_1.jpg

我想要的是让这两个图以数据集模式的中点为中心（数据集将具有相同的范围）。例如：

What I would like to see http://s29.postimg.org/z0qwv4ryr/figure_2.jpg

关于我如何做到这一点的建议？

【问题讨论】：

继续调整left，第二组条形开始previous_right 与您想要的任何值对齐。
我希望有一种更简单的方法可以做到这一点，因为这意味着我必须跟踪创建的每个柱的中点值。感觉我必须自己做太多的会计工作，matplotlib 应该为我处理这个问题。
每个人都解决了这个？它被称为发散堆积条形图。 R 对此有一个模块（HH > Likert）。我也想创造一些，但想避免重新发明轮子。
不，只是拼凑一些东西，直到我得到足够好的东西......

标签： python matplotlib plot

【解决方案1】：

我需要为一些李克特数据制作一个发散条形图。我使用的是 pandas，但如果没有它，方法可能会相似。关键机制是在开始时添加一个不可见的缓冲区。

likert_colors = ['white', 'firebrick','lightcoral','gainsboro','cornflowerblue', 'darkblue']
dummy = pd.DataFrame([[1,2,3,4, 5], [5,6,7,8, 5], [10, 4, 2, 10, 5]],
                     columns=["SD", "D", "N", "A", "SA"],
                    index=["Key 1", "Key B", "Key III"])
middles = dummy[["SD", "D"]].sum(axis=1)+dummy["N"]*.5
longest = middles.max()
complete_longest = dummy.sum(axis=1).max()
dummy.insert(0, '', (middles - longest).abs())

dummy.plot.barh(stacked=True, color=likert_colors, edgecolor='none', legend=False)
z = plt.axvline(longest, linestyle='--', color='black', alpha=.5)
z.set_zorder(-1)

plt.xlim(0, complete_longest)
xvalues = range(0,complete_longest,10)
xlabels = [str(x-longest) for x in xvalues]
plt.xticks(xvalues, xlabels)
plt.show()

这种方法有很多限制。首先，条形不再有黑色轮廓，图例将有一个额外的空白元素。我只是隐藏了图例（我认为可能有一种方法可以仅隐藏单个元素）。我不确定在不向缓冲区元素添加轮廓的情况下使条具有轮廓的便捷方法。

首先，我们建立一些颜色和虚拟数据。然后我们计算左边两列的宽度和最中间一列的一半（我知道分别是“SD”、“D”和“N”）。我找到最长的列，并使用它的宽度来计算其他列所需的差异。接下来，我将这个新的缓冲区列插入到第一列位置，并带有一个空白标题（感觉很恶心，让我告诉你）。为了更好地衡量，我还根据 [2] 的建议在中间条的中间添加了一条垂直线（axvline）。最后，我通过偏移标签来调整 x 轴以具有适当的比例。

您可能希望左侧有更多水平空间 - 您可以通过添加“最长”轻松做到这一点。

[2] Heiberger、Richard M. 和 Naomi B. Robbins。 “为李克特量表和其他应用设计发散堆积条形图。”统计软件杂志 57.5 (2014): 1-32.

【讨论】：

在这个例子中你如何定义complete_longest？
啊，我错了，我把那一行漏掉了。我已经编辑了代码以包含其定义。基本上，它是所有行之和的最大值（即最长行的长度）。
谢谢@Austin。据我所知，这是目前在 python 中制作这种情节的最佳示例。

【解决方案2】：

我最近需要为一些李克特数据制作一个发散条形图。我采取了与@austin-cory-bart 略有不同的方法。

我修改了an example from the gallery 并创建了这个：

import numpy as np
import matplotlib.pyplot as plt


category_names = ['Strongly disagree', 'Disagree',
                  'Neither agree nor disagree', 'Agree', 'Strongly agree']
results = {
    'Question 1': [10, 15, 17, 32, 26],
    'Question 2': [26, 22, 29, 10, 13],
    'Question 3': [35, 37, 7, 2, 19],
    'Question 4': [32, 11, 9, 15, 33],
    'Question 5': [21, 29, 5, 5, 40],
    'Question 6': [8, 19, 5, 30, 38]
}


def survey(results, category_names):
    """
    Parameters
    ----------
    results : dict
        A mapping from question labels to a list of answers per category.
        It is assumed all lists contain the same number of entries and that
        it matches the length of *category_names*. The order is assumed
        to be from 'Strongly disagree' to 'Strongly aisagree'
    category_names : list of str
        The category labels.
    """
    
    labels = list(results.keys())
    data = np.array(list(results.values()))
    data_cum = data.cumsum(axis=1)
    middle_index = data.shape[1]//2
    offsets = data[:, range(middle_index)].sum(axis=1) + data[:, middle_index]/2
    
    # Color Mapping
    category_colors = plt.get_cmap('coolwarm_r')(
        np.linspace(0.15, 0.85, data.shape[1]))
    
    fig, ax = plt.subplots(figsize=(10, 5))
    
    # Plot Bars
    for i, (colname, color) in enumerate(zip(category_names, category_colors)):
        widths = data[:, i]
        starts = data_cum[:, i] - widths - offsets
        rects = ax.barh(labels, widths, left=starts, height=0.5,
                        label=colname, color=color)
    
    # Add Zero Reference Line
    ax.axvline(0, linestyle='--', color='black', alpha=.25)
    
    # X Axis
    ax.set_xlim(-90, 90)
    ax.set_xticks(np.arange(-90, 91, 10))
    ax.xaxis.set_major_formatter(lambda x, pos: str(abs(int(x))))
    
    # Y Axis
    ax.invert_yaxis()
    
    # Remove spines
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    ax.spines['left'].set_visible(False)
    
    # Ledgend
    ax.legend(ncol=len(category_names), bbox_to_anchor=(0, 1),
              loc='lower left', fontsize='small')
    
    # Set Background Color
    fig.set_facecolor('#FFFFFF')

    return fig, ax


fig, ax = survey(results, category_names)
plt.show()

【讨论】：