【发布时间】:2016-02-07 15:18:11
【问题描述】:
我正在使用 ggplot2 为两个不同的参数创建直方图。我目前的方法附在我的问题的末尾(包括一个数据集,可以直接从 pasetbin.com 使用和加载),它创建了
- 根据“位置”属性(“WITHIN”或“NOT_WITHIN”)显示记录用户数据空间分布频率的直方图。
- 基于“上下文”属性(“点击的 A”或“点击的 B”)显示记录用户数据分布频率的直方图。
# Load my example dataset from pastebin
RawDataSet <- read.csv("http://pastebin.com/raw/uKybDy03", sep=";")
# Load packages
library(plyr)
library(dplyr)
library(reshape2)
library(ggplot2)
###### Create Frequency Table for Location-Information
LocationFrequency <- ddply(RawDataSet, .(UserEmail), summarize,
All = length(UserEmail),
Within_area = sum(location=="WITHIN"),
Not_within_area = sum(location=="NOT_WITHIN"))
# Create a column for unique identifiers
LocationFrequency <- mutate(LocationFrequency, id = rownames(LocationFrequency))
# Reorder columns
LocationFrequency <- LocationFrequency[,c(5,1:4)]
# Format id-column as numbers (not as string)
LocationFrequency[,c(1)] <- sapply(LocationFrequency[, c(1)], as.numeric)
# Melt data
LocationFrequency.m = melt(LocationFrequency, id.var=c("UserEmail","All","id"))
# Plot data
p <- ggplot(LocationFrequency.m, aes(x=id, y=value, fill=variable)) +
geom_bar(stat="identity") +
theme_grey(base_size = 16)+
labs(title="Histogram showing the distribution of all spatial information per user.") +
labs(x="User", y="Number of notifications interaction within/not within the area") +
# using IDs instead of UserEmail
scale_x_continuous(breaks=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30), labels=c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30"))
# Change legend Title
p + labs(fill = "Type of location")
##### Create Frequency Table for Interaction-Information
InterationFrequency <- ddply(RawDataSet, .(UserEmail), summarize,
All = length(UserEmail),
Clicked_A = sum(context=="Clicked A"),
Clicked_B = sum(context=="Clicked B"))
# Create a column for unique identifiers
InterationFrequency <- mutate(InterationFrequency, id = rownames(InterationFrequency))
# Reorder columns
InterationFrequency <- InterationFrequency[,c(5,1:4)]
# Format id-column as numbers (not as string)
InterationFrequency[,c(1)] <- sapply(InterationFrequency[, c(1)], as.numeric)
# Melt data
InterationFrequency.m = melt(InterationFrequency, id.var=c("UserEmail","All","id"))
# Plot data
p <- ggplot(InterationFrequency.m, aes(x=id, y=value, fill=variable)) +
geom_bar(stat="identity") +
theme_grey(base_size = 16)+
labs(title="Histogram showing the distribution of all interaction types per user.") +
labs(x="User", y="Number of interaction") +
# using IDs instead of UserEmail
scale_x_continuous(breaks=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30), labels=c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30"))
# Change legend Title
p + labs(fill = "Type of interaction")
但我想要实现的是:如何将两个直方图组合在一个图中?是否有可能以某种方式为每个部分放置相应的百分比? Somethink像下面的草图,它表示每个用户的观察总数(条形的完整高度)并使用不同的分割来可视化相应的数据。每个条将分为多个部分(within 和 not_within),然后将每个部分分为两个子部分,显示交互类型的百分比(*单击 A' 或点击了 B)。
【问题讨论】:
-
您能否将数据合并到您的帖子中?依靠外部资源坚持下去充其量是幼稚的。模拟数据或使用已随 R 或(通用)包之一提供的现有数据集之一。
-
根据定义,直方图只能显示一个变量(除非您在其上放置文本标签)。您在寻找马赛克图吗?
-
@RomanLuštrik:对不起。我认为包含的 pastebin-link 是使问题尽可能易于管理的完美解决方案,因为您可以使用提供的链接轻松使用我的数据。无论如何...我很快就会包含我的数据集的 sn-p。
-
@alistaire:有趣的信息 ;) 我搜索了“马赛克图”,这可以解决问题,尽管我不知道如何可视化不同的频率。
-
@Jaap 感谢您的回答。我已经更新了我的问题并评论了你的答案。也许你会找一些时间来看看我的回答和/或我的问题的编辑:)?