向热图中的特定单元格添加注释答案

【问题标题】：Add annotation to specific cells in heatmap向热图中的特定单元格添加注释
【发布时间】：2020-06-21 23:06:28
【问题描述】：

我正在绘制一个 seaborn 热图，并希望仅使用自定义文本注释特定单元格。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO

data = StringIO(u'''75,83,41,47,19
                    51,24,100,0,58
                    12,94,63,91,7
                    34,13,86,41,77''')

labels = StringIO(u'''7,8,4,,1
                    5,2,,2,8
                    1,,6,,7
                    3,1,,4,7''')

data = pd.read_csv(data, header=None)
data = data.apply(pd.to_numeric)

labels = pd.read_csv(labels, header=None)
#labels = np.ma.masked_invalid(labels)

fig, ax = plt.subplots()
sns.heatmap(data, annot=labels, ax=ax, vmin=0, vmax=100)
plt.show()

以上代码生成如下热图：

注释行生成以下热图：

我只想在单元格上显示非 nan（或非零）文本。如何实现？

【问题讨论】：

标签： python pandas numpy matplotlib seaborn

【解决方案1】：

为annot 使用字符串数组而不是掩码数组：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO

data = StringIO(u'''75,83,41,47,19
                    51,24,100,0,58
                    12,94,63,91,7
                    34,13,86,41,77''')

labels = StringIO(u'''7,8,4,,1
                    5,2,,2,8
                    1,,6,,7
                    3,1,,4,7''')

data = pd.read_csv(data, header=None)
data = data.apply(pd.to_numeric)

labels = pd.read_csv(labels, header=None)
#labels = np.ma.masked_invalid(labels)

# Convert everything to strings:
annotations = labels.astype(str)
annotations[np.isnan(labels)] = ""

fig, ax = plt.subplots()
sns.heatmap(data, annot=annotations, fmt="s", ax=ax, vmin=0, vmax=100)
plt.show()

【讨论】：

【解决方案2】：

要补充@mrzo 的答案，您可以在read_csv() 中使用na_filter=False 将nans 存储为空字符串，并使用pandas.DataFrame.astype() 就地转换为字符串：

# ...
labels = pd.read_csv(labels, header=None, na_filter=False).astype(str)
sns.heatmap(data, annot=labels, fmt='s', ax=ax, vmin=0, vmax=100)

【讨论】：

【解决方案3】：

只是要添加这个，因为我花了一些时间来弄清楚如何以编程方式为稍微不同的应用程序做类似的事情：我想从注释中抑制 0 值，但是因为这些值是由于交叉表操作而产生的

可能有一种更优雅的方式来做到这一点，但对我来说，通过numpy 运行它非常快而且非常简单。

import numpy as np
import pandas as pd
import seaborn as sns

from io import StringIO

data = StringIO(u'''75,83,41,47,19
                    51,24,100,0,58
                    12,94,63,91,7
                    34,13,86,41,77''')

data = pd.read_csv(data, header=None)
data = data.apply(pd.to_numeric)

# For more complex functions you could write a def instead
# of using this simple lambda function
an = np.vectorize(lambda x: '' if x<50 else str(round(x,-1)))(data.to_numpy())

sns.heatmap(
    data=data.to_numpy(), # Note this is now numpy too
    cmap='BuPu',
    annot=an,   # The matching ndarray of annotations
    fmt = '',   # Formats annotations as strings (i.e. no formatting)
    cbar=False, # Seems overkill if you've got annotations
    vmin=0, 
    vmax=data.max().max()
)

这可能会使标记轴的工作变得更加困难，尽管它很简单：ax.set_xticklabels(df.columns.values)。如果您在第一列中有轴标签，那么您需要在您的 to_numpy 调用中使用 iloc (data.iloc[:,1:])，但结合自定义颜色图（例如 0==white），您可以创建更容易查看的热图。

显然，粗略的四舍五入令人困惑（为什么 80 有不同的色调？）但你明白了：

【讨论】：