【问题标题】:Summarizing the count of values of one column in 2 different columns总结 2 个不同列中一列的值的计数
【发布时间】:2021-04-23 03:18:11
【问题描述】:

我有一个名为reviews_gh 的df,格式如下

Date         Market  Positive.or.Negative.
01-01-2020     A              Positive
01-01-2020     A              Positive
01-01-2020     B              Positive
01-01-2020     B              Negative
....

我正在尝试按日期和业务进行分组,并创建一个名为正面和负面的新列,它汇总了当天在该市场中负面和正面的次数

这是我现在的代码

  reviews_gh_agg <- reviews_gh %>% 
  group_by(Date, Market) %>% 
  summarise(positive = sum(reviews_gh$Positive.or.Negative.=="Positive"), negative = 
  sum(reviews_gh$Positive.or.Negative.=="Negative") )

但我得到的结果是错误的,我在新的正面和负面列上显示所有观察的总和,而不是按日期和市场分组

上面的小例子的结果应该是

    Date         Market  Positive     Negative
01-01-2020     A            2            0
01-01-2020     B            1            1         

感谢您的帮助

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    我希望这是您正在寻找的。我只是对您的代码稍作修改,因为由于数据屏蔽,您不需要 $ 来引用 tidyverse 中的列名。

    df %>% 
      group_by(Date, Market) %>% 
      summarise(positive = sum(Positive.or.Negative.=="Positive"), negative = 
                  sum(Positive.or.Negative.=="Negative"))
    
    
    # A tibble: 2 x 4
    # Groups:   Date [1]
      Date       Market positive negative
      <chr>      <chr>     <int>    <int>
    1 01-01-2020 A             2        0
    2 01-01-2020 B             1        1
    
    

    更新 亲爱的@akrun 的另一个有价值的解决方案。

    df %>%
      group_by(Date, Market) %>%
      summarise(out = list(table(Positive.or.Negative.)), .groups = "drop") %>%
      unnest_wider(c(out))
    
    # A tibble: 2 x 4
      Date       Market Positive Negative
      <chr>      <chr>     <int>    <int>
    1 01-01-2020 A             2       NA
    2 01-01-2020 B             1        1
    

    日期

    df <- tribble(
      ~Date,         ~Market,  ~Positive.or.Negative.,
      "01-01-2020",     "A",              "Positive",
      "01-01-2020",     "A",              "Positive",
      "01-01-2020",     "B",              "Positive",
      "01-01-2020",     "B",              "Negative"
    )
    

    【讨论】:

    • 或者tabledf %&gt;% group_by(Date, Market) %&gt;% summarise(out = list(table(Positive.or.Negative.)), .groups = 'drop') %&gt;% unnest_wider(c(out))的另一个选项
    • 谢谢你亲爱的@akrun,我从来没有在调用summarize 时使用过table,但这似乎是一个有趣的想法,因为我们正在这里寻找计数的列联表。上面的代码不属于我,因为我认为解决方案已经存在,只需要稍作调整,因此所有功劳都归于 OP。我会以你的名义将你的解决方案添加到我的帖子中。非常感谢。
    【解决方案2】:

    这是另一个tidyverse 解决方案,使用countpivot_wider

    library(tidyverse)
    
    df %>% 
      # Group by Date, Market and Positive/Negative
      group_by(Date, Market, Positive.or.Negative.) %>%
      # Count
      count() %>%
      # Change to wide format, fill NA with 0's
      pivot_wider(names_from = Positive.or.Negative.,
                  values_from = n,
                  values_fill = 0)
    

    【讨论】:

      【解决方案3】:

      您可以使用tidyr::pivot_wider 做到这一点:

      tidyr::pivot_wider(df, names_from = Positive.or.Negative., 
                             values_from = Positive.or.Negative., 
                             values_fn = length, 
                             values_fill = 0)
      
      #  Date       Market Positive Negative
      #  <chr>      <chr>     <int>    <int>
      #1 01-01-2020 A             2        0
      #2 01-01-2020 B             1        1
      

      data.table

      library(data.table)
      
      dcast(setDT(df),  Date + Market~Positive.or.Negative., 
            value.var = 'Positive.or.Negative.', fun.aggregate = length)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-08-09
        • 2020-06-28
        • 1970-01-01
        相关资源
        最近更新 更多