求和行多个变量答案

【问题标题】：sum row multiple variables求和行多个变量
【发布时间】：2018-08-31 13:36:08
【问题描述】：

我的数据集具有以下结构：

id amount zipcode cat1 cat1_times cat2 cat2_times
1  1000   1001      0       0        1      7
2  2000   1001      0       0        1      7
3  2300   1002      1       6        1      5
4  1500   1002      1       6        1      5
5  2700   1003      1       3        1      5
6  3400   1003      1       3        1      5

Cat1 是一个二进制变量，如果在某个邮政编码中存在类别 1 的建筑物，则取值为 1。 Cat1_times 是某个邮政编码中类别 1 的建筑物数量。我想计算每一行的建筑总数（cat1 + cat2）：

id amount zipcode cat1 cat1_times cat2 cat2_times total_times
1  1000   1001      0       0        1      7          7
2  2000   1001      0       0        1      7          7
3  2300   1002      1       6        1      5          11          
4  1500   1002      1       6        1      5          11
5  2700   1003      1       3        1      5          8
6  3400   1003      1       3        1      5          8

我尝试了 sum(cat1_times,cat2_times)，但每一行的结果都相同。

【问题讨论】：

df$total_times = df$cat1_times + df$cat2_times 应该可以工作。
那么简单df$cat1_times + df$cat2_times??...
谢谢大家，这两种方法都可以，你知道为什么函数sum()不工作吗？

标签： r sum line

【解决方案1】：

使用stringr 的str_detect 和rowSums

library(stringr)
df$Total=rowSums(df[,names(df)[str_detect(names(df),'times')]])
df
  id amount zipcode cat1 cat1_times cat2 cat2_times Total
1  1   1000    1001    0          0    1          7     7
2  2   2000    1001    0          0    1          7     7
3  3   2300    1002    1          6    1          5    11
4  4   1500    1002    1          6    1          5    11
5  5   2700    1003    1          3    1          5     8
6  6   3400    1003    1          3    1          5     8

【讨论】：

【解决方案2】：

或者：

library(dplyr)

df1 %>% select(matches("times")) %>% transmute(total_times=rowSums(.)) %>% bind_cols(df1,.)

#  id amount zipcode cat1 cat1_times cat2 cat2_times total_times
#1  1   1000    1001    0          0    1          7           7
#2  2   2000    1001    0          0    1          7           7
#3  3   2300    1002    1          6    1          5          11
#4  4   1500    1002    1          6    1          5          11
#5  5   2700    1003    1          3    1          5           8
#6  6   3400    1003    1          3    1          5           8

【讨论】：

【解决方案3】：

或者，如果您有很多列

numberOfCategories=2
rowSums(df[,paste0('cat',1:numberOfCategories,'_times')])

【讨论】：

【解决方案4】：

使用base R

df1$total_times <- Reduce(`+`, df1[grep('cat\\d+_times', names(df1))])
df1$total_times
#[1]  7  7 11 11  8  8

【讨论】：