【问题标题】:rowcount based on another column 1 in r [duplicate]基于r中另一列1的行数[重复]
【发布时间】:2018-10-23 05:35:02
【问题描述】:

好的,我有以下数千行的数据框,数据框的输出如下所示。这个数据框记录了一个电子商务网站上的订单,它列出了每个订单id所购买的产品

     | order_id| product_id|product_name                     |
     |--------:|----------:|:--------------------------------|
     |  1187899|        196|Soda                             |
     |  1187899|      25133|Organic String Cheese            |
     |  1187899|      38928|0% Greek Strained Yogurt         |
     |  1187899|      26405|XL Pick-A-Size Paper Towel Rolls |
     |  1187899|      39657|Milk Chocolate Almonds           |
     |  1187899|      10258|Pistachios                       |
     |  1187899|      13032|Cinnamon Toast Crunch            |
     |  1187899|      26088|Aged White Cheddar Popcorn       |
     |  1187899|      27845|Organic Whole Milk               |
     |  1187899|      49235|Organic Half & Half              |
     |  1187899|      46149|Zero Calorie Cola                |
     |  1492625|      22963|Organic Roasted Turkey Breast    |
     |  1492625|       7963|Gluten Free Whole Grain Bread    |
     |  1492625|      16589|Plantain Chips                   |
     |  1492625|      32792|Chipotle Beef & Pork Realstick   |

用于列出上述数据框的代码是:

 temp <- orders  %>%
  inner_join(opt,by="order_id") %>%
  inner_join(products,by="product_id") %>%
  select(order_id,product_id,product_name)
  kable(head(temp,15))

我想统计订购次数最多的产品,基本上,我的输出应该是这样的:

     product_id | Order_Count
        196         10025
        7963        9025
        25133       8903

我不知道该怎么做,我试过以下:

      mutate(prods = count(product_id))

但它不起作用我收到一条错误消息:Error in mutate_impl(.data, dots) : Evaluation error: no applicable method for 'groups' applied to an object of class "factor".

任何帮助将不胜感激!

【问题讨论】:

  • table(temp$product_id)?
  • 谢谢,这么简单,成功了,最后我用sort(table(temp$product_name),decreasing = TRUE)降序排序,现在想办法在ggplot中使用
  • 至于包裹ggplot2我建议你再问一个问题。但请使用dput(temp) 发布数据,或者,如果temp 太大,请使用dput(head(temp, 30))
  • 您建议使用 dput 的任何特殊原因?
  • @GaurangSwarge dput(temp) 将以一种便于其他人重新创建和提供解决方案的格式详细说明您的 data.frame。否则,谁会做大量不必要的打字来帮助你。

标签: r dataframe dplyr


【解决方案1】:

您可以使用 table() 打印一个简单的表格(如 Rui Barradas 所述),或者如果您想要一个带有计数的数据框,请使用 dplyr::count()

library(tidyverse)

orders <- tibble::tribble(
  ~order_id, ~product_id, ~product_name,
  "1187899", "196", "Soda",
  "1187899", "25133", "Organic String Cheese",
  "1187899", "38928", "0% Greek Strained Yogurt",
  "1187899", "26405", "XL Pick-A-Size Paper Towel Rolls",
  "1187899", "39657", "Milk Chocolate Almonds",
  "1187899", "10258", "Pistachios",
  "1187899", "10258", "Pistachios",
  "1187899", "10258", "Pistachios",
  "1187899", "13032", "Cinnamon Toast Crunch",
  "1187899", "13032", "Cinnamon Toast Crunch",
  "1187899", "26088", "Aged White Cheddar Popcorn",
  "1187899", "27845", "Organic Whole Milk",
  "1187899", "49235", "Organic Half & Half",
  "1187899", "46149", "Zero Calorie Cola",
  "1492625", "22963", "Organic Roasted Turkey Breast",
  "1492625", "7963", "Gluten Free Whole Grain Bread",
  "1492625", "16589", "Plantain Chips",
  "1492625", "32792", "Chipotle Beef & Pork Realstick"
)

一个简单的打印表格,其中包含(例如)每个 product_id 计数

table(orders$product_id)

但是,如果您想要一个带有计数的数据框、进行绘图或用于任何用途,那么

orders %>%
  count(product_id, product_name)

> + # A tibble: 15 x 3
>    product_id product_name                         n
>    <chr>      <chr>                            <int>
>  1 10258      Pistachios                           3
>  2 13032      Cinnamon Toast Crunch                2
>  3 16589      Plantain Chips                       1
>  4 196        Soda                                 1
>  5 22963      Organic Roasted Turkey Breast        1
>  6 25133      Organic String Cheese                1
>  7 26088      Aged White Cheddar Popcorn           1
>  8 26405      XL Pick-A-Size Paper Towel Rolls     1
>  9 27845      Organic Whole Milk                   1
> 10 32792      Chipotle Beef & Pork Realstick       1
> 11 38928      0% Greek Strained Yogurt             1
> 12 39657      Milk Chocolate Almonds               1
> 13 46149      Zero Calorie Cola                    1
> 14 49235      Organic Half & Half                  1
> 15 7963       Gluten Free Whole Grain Bread        1

【讨论】:

    猜你喜欢
    • 2021-12-21
    • 2012-05-19
    • 2020-08-27
    • 2021-05-10
    • 2022-07-07
    • 2021-01-31
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多