将总计添加到数据框答案

【问题标题】：Adding totals to a data frame将总计添加到数据框
【发布时间】：2020-08-17 10:45:09
【问题描述】：

我想将总计添加到我的数据框中，但遇到了困难，因为数据非常混乱（一如既往！） - 有些列是文本，有些是日期，有些是数字。我无法发布实际数据，因为它很敏感，但我将展示一个具有代表性的示例，它是相同的结构（下面 - 所需的列是黄色的。我一直在尝试使用 dplyr 和管道来执行此操作，但由于以下原因遇到了问题文字和数字的混合......

数据：

date <- c("17/08/2020", "17/08/2020", "17/08/2020", "17/08/2020","18/08/2020", "18/08/2020", "18/08/2020", "18/08/2020")

type <- c("type A", "type B", "type A", "type B","type A", "type B","type A", "type B")

location <- c("USA","USA","India","India","USA","USA","India","India")

value <- c("10","10","frak","frak","15","15","open","open")

df <- data.frame(date, type, location, value)

基本上，我需要按日期、类型和位置汇总。 enter image description here

【问题讨论】：

不确定您到底想要什么，因为您提供的图片和描述在分组方面有所不同。 “frak”和“open”应该被过滤掉，它们应该是NA还是应该在汇总数据中？
无法对诸如 frak 和 open 之类的文本数据求和，因此“n/a”或空白对这些数据都可以......我的形象

标签： r dplyr pipe summarize

【解决方案1】：

不确定这是否是你所追求的。

df %>%
  group_by(date, type = "total_type", location) %>%
  summarise("value" = sum(as.numeric(value), na.rm = F)) %>%
  mutate(value = as.character(value)) %>%
  bind_rows(df)

# A tibble: 12 x 4
# Groups:   date, type [6]
   date       type       location value
   <chr>      <chr>      <chr>    <chr>
 1 17/08/2020 total_type India    NA   
 2 17/08/2020 total_type USA      20   
 3 18/08/2020 total_type India    NA   
 4 18/08/2020 total_type USA      30   
 5 17/08/2020 type A     USA      10   
 6 17/08/2020 type B     USA      10   
 7 17/08/2020 type A     India    frak 
 8 17/08/2020 type B     India    frak 
 9 18/08/2020 type A     USA      15   
10 18/08/2020 type B     USA      15   
11 18/08/2020 type A     India    open 
12 18/08/2020 type B     India    open

除value 之外的所有列分组会重现您的原始表，并且在您的图像中汇总行的类型 = total_type。另一方面，您在图像中的所有汇总行都有位置USA，这也没有意义，所以我就让它保持原样。

【讨论】：

这看起来不错，谢谢。虽然我希望将新行添加到原始数据框中？
这样写group_by语句：group_by(date, type = "total_type", location) %>%然后使用bind_rows将df绑定到带有总数的数据框
@R_debutante 欢迎来到 Stack Overflow！如果您已经知道这一点，我们深表歉意，但如果您找到了可以回答您问题的解决方案，请按照此处提供的说明进行操作：What should I do when someone answers my question?

【解决方案2】：

我会建议下一种方法，它也与@Humpelstielzchen 提出的方法接近，与您在图像中显示的接近：

library(dplyr)

df %>% bind_rows(df %>% group_by(date,location) %>%
                   mutate(value=as.numeric(value)) %>% 
                   summarise(value=sum(value,na.rm=F)) %>%
                   mutate(type='total type',value=as.character(value)))

输出：

         date       type location value
1  17/08/2020     type A      USA    10
2  17/08/2020     type B      USA    10
3  17/08/2020     type A    India  frak
4  17/08/2020     type B    India  frak
5  18/08/2020     type A      USA    15
6  18/08/2020     type B      USA    15
7  18/08/2020     type A    India  open
8  18/08/2020     type B    India  open
9  17/08/2020 total type    India  <NA>
10 17/08/2020 total type      USA    20
11 18/08/2020 total type    India  <NA>
12 18/08/2020 total type      USA    30

更新：由于 OP 的包版本问题，这里可以使用一种方法：

library(dplyr)
#Data
date <- c("17/08/2020", "17/08/2020", "17/08/2020", "17/08/2020","18/08/2020", "18/08/2020", "18/08/2020", "18/08/2020")

type <- c("type A", "type B", "type A", "type B","type A", "type B","type A", "type B")

location <- c("USA","USA","India","India","USA","USA","India","India")

value <- c("10","10","frak","frak","15","15","open","open")

df <- data.frame(date, type, location, value,stringsAsFactors = F)
#Mutate for summary
df1 <- df %>% group_by(date,location) %>%
  mutate(value=as.numeric(value)) %>% 
  summarise(value=sum(value,na.rm=F)) %>%
  mutate(type='total type') %>% ungroup()
df1$value <- as.character(df1$value)
#Bind
df2 <- rbind(df,df1)

输出：

         date       type location value
1  17/08/2020     type A      USA    10
2  17/08/2020     type B      USA    10
3  17/08/2020     type A    India  frak
4  17/08/2020     type B    India  frak
5  18/08/2020     type A      USA    15
6  18/08/2020     type B      USA    15
7  18/08/2020     type A    India  open
8  18/08/2020     type B    India  open
9  17/08/2020 total type    India  <NA>
10 17/08/2020 total type      USA    20
11 18/08/2020 total type    India  <NA>
12 18/08/2020 total type      USA    30

【讨论】：

这看起来很完美，但我收到一个错误 - 错误：列 value 无法从数字转换为字符
@R_debutante 很奇怪你能不能试试df %>% bind_rows(df %>% group_by(date,location) %>% mutate(value=as.numeric(value)) %>% summarise(value=sum(value,na.rm=F)) %>% ungroup() %>% mutate(type='total type',value=as.character(value)))
嗯。是的，我仍然收到相同的错误消息？错误：列“值”无法从数字转换为字符
@R_debutante 请重新启动R 看看问题是否仍然存在？我相信是软件包的问题，或者可能将 dplyr 更新到最新版本！
我正在运行 dplyr 0.8，认为这是问题所在 - 虽然工作场所服务器一样，但我无法升级...