【发布时间】:2020-03-28 13:08:25
【问题描述】:
我有一个如下所示的数据集:
df <- tribble(
~id, ~price, ~number_of_book,
"1", 10, 3,
"1", 5, 1,
"2", 7, 4,
"2", 6, 2,
"2", 3, 4,
"3", 4, 1,
"4", 5, 1,
"4", 6, 1,
"5", 1, 2,
"5", 9, 3,
)
正如您在数据集中看到的,如果 id 为“1”,则有 3 本书每本书的价格为 10 美元,而 1 本书的价格为 5 美元。基本上,我想查看每个价格箱的图书数量的份额(%)。这是我想要的数据集:
df <- tribble(
~id, ~less_than_three, ~three-five, ~five-six, ~more_than_six,
"1", "0%", "25%", "0%", "75%",
"2", "0%", "40%", "20%", "40%",
"3", "0%", "100%", "0%", "0%",
"4", "0%", "50%", "50%", "0%",
"5", "40%", "0%", "0%", "60%",
)
现在,我首先对价格进行聚类。为此,我运行以下代码:
out <- cut(df$price, breaks = c(0, 3, 5, 6, 10),
labels = c("<3","3-5","5-6", ">6"))
out = table(out) / sum(table(out))
但不幸的是,由于缺乏编码知识,我无法更进一步。你能帮我得到想要的数据吗?
【问题讨论】:
标签: r dataframe dplyr tidyverse intervals