【问题标题】:How to group_by multiple column /variable如何按多列/变量分组
【发布时间】:2021-04-12 21:33:03
【问题描述】:

这是我的代码:

  Groupby_sample %>%
  group_by(Network, Merchant, Status) %>%
  summarise(Tranx_count = n())

我想要将每个观察分组为一个的东西,就像我在下面显示的 excel 数据透视表一样。 Python 给出了等价的东西,但 R 是按 1 次观察分组的。

这是一个示例数据集:

Merchant    Recipient   Network Type    FaceValue   Date    Status
Economy 7012086632  Newest  Airtime 100 02/04/2021 0:05 Transaction Declined
Economy 9013347171  Newest  Airtime 100 02/04/2021 0:06 Transaction Declined
Economy 7083816093  Newest  Airtime 200 02/04/2021 0:08 Transaction Declined
polly   8126029470  Newest  Airtime 2000    02/04/2021 0:09 Transaction Declined
Star    8020391914  Newest  Airtime 200 02/04/2021 0:10 DECLINED
Munifat 7012349167  Newest  Airtime 100 02/04/2021 0:12 DECLINED
Munifat 9078126934  AT AT   Airtime 500 02/04/2021 0:13 DECLINED
polly   9070149314  AT AT   Airtime 100 02/04/2021 0:17 DECLINED
polly   9012964375  AT AT   Airtime 500 02/04/2021 0:18 DECLINED
polly digital   9026410183  AT AT   Airtime 1000    02/04/2021 0:19 DECLINED
Economy 7088794494  AT AT   Airtime 500 02/04/2021 0:23 Transaction Declined
Economy 7082168900  AT AT   Airtime 100 02/04/2021 0:33 Transaction Declined
Economy 9020689920  AT AT   Airtime 100 02/04/2021 3:43 Transaction Declined
polly digital   9049041083  AT AT   Airtime 100 02/04/2021 4:07 FAILED
Star    9019433081  Newest  Airtime 1000    02/04/2021 4:09 FAILED

请注意我的知识有限

【问题讨论】:

  • 从您的问题中不清楚您计划如何进一步使用这些数据。你只是想生成一个像截图一样的图形表吗?这个问题是一个很好的起点:stackoverflow.com/questions/18622854/…
  • 使用dput()函数可以更好地共享您的数据样本。

标签: r dplyr data-science tidyverse


【解决方案1】:

试试,用 {tidyverse},

library(tidyverse)
DF <- 
  structure(list(
    Merchant = c("Economy", "Economy", "Economy", "polly", "Star", "Munifat", "Munifat", "polly", "polly", "polly digital", "Economy", "Economy", "Economy", "polly digital", "Star"), Recipient = c("7012086632", "9013347171", "7083816093", "8126029470", "8020391914", "7012349167", "9078126934", "9070149314", "9012964375", "9026410183", "7088794494", "7082168900", "9020689920", "9049041083", "9019433081"), 
    Network = c("Newest", "Newest", "Newest", "Newest", "Newest", "Newest", "AT AT", "AT AT", "AT AT", "AT AT", "AT AT", "AT AT", "AT AT", "AT AT", "Newest"), 
    Type = c("Airtime", "Airtime", "Airtime", "Airtime", "Airtime", "Airtime", "Airtime", "Airtime", "Airtime", "Airtime", "Airtime", "Airtime", "Airtime", "Airtime", "Airtime"), FaceValue = c(100L, 100L, 200L, 2000L, 200L, 100L, 500L, 100L, 500L, 1000L, 500L, 100L, 100L, 100L, 1000L), 
    Date = structure(c(1617321900, 1617321960, 1617322080, 1617322140, 1617322200, 1617322320, 1617322380, 1617322620, 1617322680, 1617322740, 1617322980, 1617323580, 1617334980, 1617336420, 1617336540), tzone = "UTC", class = c("POSIXct", "POSIXt")),     
    Status = c("Transaction Declined", "Transaction Declined",     "Transaction Declined", "Transaction Declined", "DECLINED",     "DECLINED", "DECLINED", "DECLINED", "DECLINED", "DECLINED",     "Transaction Declined", "Transaction Declined", "Transaction Declined",     "FAILED", "FAILED")), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), 
    row.names = c(NA, -15L))

DF %>% count(Network, Status, Merchant)
# A tibble: 10 x 4
   Network Status               Merchant          n
   <chr>   <chr>                <chr>         <int>
 1 AT AT   DECLINED             Munifat           1
 2 AT AT   DECLINED             polly             2
 3 AT AT   DECLINED             polly digital     1
 4 AT AT   FAILED               polly digital     1
 5 AT AT   Transaction Declined Economy           3
 6 Newest  DECLINED             Munifat           1
 7 Newest  DECLINED             Star              1
 8 Newest  FAILED               Star              1
 9 Newest  Transaction Declined Economy           3
10 Newest  Transaction Declined polly             1

【讨论】:

    猜你喜欢
    • 2013-10-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-08-12
    • 2023-01-12
    • 1970-01-01
    相关资源
    最近更新 更多