ggplot2中的密度直方图：标签条高度[重复]答案

【问题标题】：density histogram in ggplot2: label bar height [duplicate]ggplot2中的密度直方图：标签条高度[重复]
【发布时间】：2016-05-07 17:02:21
【问题描述】：

我有数据告诉我完成一项任务需要多少分钟：

dat = data.frame(a = c(5.5,7,4,20,4.75,6,5,8.5,10,10.5,13.5,14,11))

我用 ggplot2 包绘制了数据的密度直方图：

p=ggplot(dat, aes(x=a)) + geom_histogram(aes(y=..density..),breaks = seq(4,20,by=2))+xlab("Required Solving Time")

现在我想在它上面添加每个密度条的高度标签。我试图达到这个通过添加+geom_text(label=..density..)。这会返回错误

找不到对象'..密度..'

然而。有谁知道geom_text() 函数的输入是什么在我的情况下获得这些标签？

没有geom_text() 的解决方案也可以，但我更喜欢留在 ggplot2 包中。

【问题讨论】：

这就是你想要的吗？ stackoverflow.com/questions/24198896/…
是的，我在搜索 stackoverflow 时看到了这个答案，但在我的情况下，它是密度直方图，而不是绝对频率条。我无法从那个答案中找到解决问题的方法......

标签： r ggplot2 histogram

【解决方案1】：

您可以使用stat_bin 和geom="text" 标记条形。 stat_bin计算计数，我们使用..density.. 将其转换为密度，就像geom_histogram 一样。但是通过设置geom="text"，我们将这些密度值显示为文本。我们还需要为geom_histogram 和stat_bin 设置相同的breaks，以便密度值匹配。我通过将标签中的 ..density.. 乘以 0.5 将文本标签放置在栏的中间。但是，您当然可以随意调整。

breaks = seq(4,20,by=2)  

ggplot(dat, aes(x=a)) + 
  geom_histogram(aes(y=..density..), breaks = breaks) + 
  stat_bin(geom="text", aes(label=round(..density..,2), y=0.5*..density..), 
           breaks=breaks, colour="white") +
  xlab("Required Solving Time")

要获取条形上方的标签，您可以使用：

ggplot(dat, aes(x=a)) + 
  geom_histogram(aes(y=..density..), breaks = breaks) + 
  stat_bin(geom="text", aes(label=round(..density..,2), y=..density..),
           breaks=breaks, vjust = -1) +
  xlab("Required Solving Time")

【讨论】：

【解决方案2】：

..density.. 来自 stat，所以你需要告诉这个层也使用 binning statistic，

p + geom_text(aes(label=round(..density.., 2), y=..density..), 
              stat="bin", breaks = seq(4,20,by=2), 
              col="white", vjust=1)

【讨论】：

【解决方案3】：

您可以使用ggplot_build()：

library(ggplot2)
dat = data.frame(a = c(5.5,7,4,20,4.75,6,5,8.5,10,10.5,13.5,14,11))
p=ggplot(dat, aes(x=a)) + 
   geom_histogram(aes(y=..density..),breaks = seq(4,20,by=2))+xlab("Required Solving Time")

ggplot_build(p)$data
#[[1]]
#          y count  x xmin xmax    density ncount ndensity PANEL group ymin       ymax colour   fill size linetype alpha
#1 0.19230769     5  5    4    6 0.19230769    1.0     26.0     1    -1    0 0.19230769     NA grey35  0.5        1    NA
#2 0.03846154     1  7    6    8 0.03846154    0.2      5.2     1    -1    0 0.03846154     NA grey35  0.5        1    NA
#3 0.07692308     2  9    8   10 0.07692308    0.4     10.4     1    -1    0 0.07692308     NA grey35  0.5        1    NA
#4 0.07692308     2 11   10   12 0.07692308    0.4     10.4     1    -1    0 0.07692308     NA grey35  0.5        1    NA
#5 0.07692308     2 13   12   14 0.07692308    0.4     10.4     1    -1    0 0.07692308     NA grey35  0.5        1    NA
#6 0.00000000     0 15   14   16 0.00000000    0.0      0.0     1    -1    0 0.00000000     NA grey35  0.5        1    NA
#7 0.00000000     0 17   16   18 0.00000000    0.0      0.0     1    -1    0 0.00000000     NA grey35  0.5        1    NA
#8 0.03846154     1 19   18   20 0.03846154    0.2      5.2     1    -1    0 0.03846154     NA grey35  0.5        1    NA


p + geom_text(data = as.data.frame(ggplot_build(p)$data), 
              aes(x=x, y= density , label = round(density,2)), 
              nudge_y = 0.005)

【讨论】：

在this question,Hadley 上的 cmets 中“强烈建议[s]反对”使用像 PANEL 这样的内部变量，我将其视为 ggplot_build() 输出中的列之一。 ggpplot_build() 中的其他变量（例如密度）是否被认为使用起来更安全？
或者 ggplot_build(p)$data$PANEL 不是“内部”面板并且可以安全使用？ The docs 似乎暗示 ggplot_build() 应该和任何东西一样可靠，因为它是由 print.ggplot 返回的（不可见的）。我在上面链接到的哈德利警告来自 2013 年......