幸运的是,就我而言,罗夏的答案非常有效。我在这里是为了避免使用 Megan Halbrook 提出的解决方案,直到我意识到它不是一个正确的解决方案。
在直方图中添加一条密度线会自动将 y 轴更改为频率密度,而不是百分比。只有当 binwidth = 1 时,频率密度的值才等同于百分比。
谷歌搜索:要绘制直方图,首先要找到每个类别的类宽度。条形的面积代表频率,因此要找到条形的高度,请将频率除以类宽度。这称为频率密度。 https://www.bbc.co.uk/bitesize/guides/zc7sb82/revision/9
下面是一个示例,其中左侧面板显示百分比,右侧面板显示 y 轴的密度。
library(ggplot2)
library(gridExtra)
TABLE <- data.frame(vari = c(0,1,1,2,3,3,3,4,4,4,5,5,6,7,7,8))
## selected binwidth
bw <- 2
## plot using count
plot_count <- ggplot(TABLE, aes(x = vari)) +
geom_histogram(aes(y = ..count../sum(..count..)*100), binwidth = bw, col =1)
## plot using density
plot_density <- ggplot(TABLE, aes(x = vari)) +
geom_histogram(aes(y = ..density..), binwidth = bw, col = 1)
## visualize together
grid.arrange(ncol = 2, grobs = list(plot_count,plot_density))
## visualize the values
data_count <- ggplot_build(plot_count)
data_density <- ggplot_build(plot_density)
## using ..count../sum(..count..) the values of the y axis are the same as
## density * bindwidth * 100. This is because density shows the "frequency density".
data_count$data[[1]]$y == data_count$data[[1]]$density*bw * 100
## using ..density.. the values of the y axis are the "frequency densities".
data_density$data[[1]]$y == data_density$data[[1]]$density
## manually calculated percentage for each range of the histogram. Note
## geom_histogram use right-closed intervals.
min_range_of_intervals <- data_count$data[[1]]$xmin
for(i in min_range_of_intervals)
cat(paste("Values >",i,"and <=",i+bw,"involve a percent of",
sum(TABLE$vari>i & TABLE$vari<=(i+bw))/nrow(TABLE)*100),"\n")
# Values > -1 and <= 1 involve a percent of 18.75
# Values > 1 and <= 3 involve a percent of 25
# Values > 3 and <= 5 involve a percent of 31.25
# Values > 5 and <= 7 involve a percent of 18.75
# Values > 7 and <= 9 involve a percent of 6.25