R中每个商家销售的前n个产品答案

【问题标题】：Top n products sold for each merchant in RR中每个商家销售的前n个产品
【发布时间】：2017-10-22 08:44:38
【问题描述】：

我已经阅读了 dplyr 文档，但仍然无法很好地理解 group_by 方法。我正在尝试查找每个商家 (merchant_id) 销售的前 3 种产品 (product_id)。我尝试的代码如下

tmp <- orders %>%
         group_by(product_id, merchant_id) %>%
         summarize(count = n()) %>% 
         top_n(3, wt = count) %>%
         arrange(desc(count))

如果商家销售的独特产品少于 3 件，我希望他们显示相应数量的产品。

输入

order_id | product_id | merchant_id |
---------|------------|-------------|
23409    |  131883    |   597       |
23683    |  131885    |   597       |
25325    |  131885    |   597       |
25390    |  131885    |   597       |
25410    |  131888    |   597       |
25325    |  223783    |   613       |
28932    |  223815    |   613       |
38197    |  298483    |   613       |
48728    |  298483    |   613       |

如果我要找到每个产品的前 3 个产品，我希望输出是（示例中的计数不匹配，因为我必须创建许多行，但格式将是这样的）

输出

count    | product_id | merchant_id |
---------|------------|-------------|
    5    |  131883    |   597       |
    3    |  131885    |   597       |
    2    |  131888    |   597       |
    4    |  223783    |   613       |
    2    |  223815    |   613       |
    1    |  298483    |   613       |

【问题讨论】：

请提供一个可重现的例子

标签： r dplyr

【解决方案1】：

您需要稍微修改一下代码。你想对merchant_id分组的数据做top_n，而不是(product_id, merchant_id)分组的数据，所以你应该在再次分组之前取消分组。此外，如果您希望 arrange 对 merchant_id 组中的计数进行排序，您也应该指定它。

orders %>%
group_by(product_id, merchant_id) %>%
summarize(count = n()) %>% 
ungroup() %>% 
group_by(merchant_id) %>%
top_n(3, wt = count) %>% 
arrange(merchant_id,desc(count))

这会返回：

product_id merchant_id count
       <int>       <int> <int>
1     131885         597     3
2     131883         597     1
3     131888         597     1
4     298483         613     2
5     223783         613     1
6     223815         613     1

另外，请注意，如果存在平局，top_n 可以返回 n 个以上的值。

【讨论】：