【发布时间】:2019-09-27 15:15:56
【问题描述】:
我有一个如下所示的数据集:
city period_day date
1 barcelona morning 2017-01-15
2 sao_paulo afternoon 2016-12-07
3 sao_paulo morning 2016-11-16
4 barcelona morning 2016-11-06
5 barcelona afternoon 2016-12-31
6 sao_paulo afternoon 2016-11-30
7 barcelona morning 2016-10-15
8 barcelona afternoon 2016-11-30
9 sao_paulo afternoon 2016-12-24
10 sao_paulo afternoon 2017-02-02
对于每一行,我想计算有多少行的日期早于该行的日期,包括 city 和 period_day。在这种情况下,我想要这个结果:
city period_day date row_count
1 barcelona morning 2017-01-15 2
2 sao_paulo afternoon 2016-12-07 1
3 sao_paulo morning 2016-11-16 0
4 barcelona morning 2016-11-06 1
5 barcelona afternoon 2016-12-31 1
6 sao_paulo afternoon 2016-11-30 0
7 barcelona morning 2016-10-15 0
8 barcelona afternoon 2016-11-30 0
9 sao_paulo afternoon 2016-12-24 2
10 sao_paulo afternoon 2017-02-02 3
row_count 为 0 时,表示为旧日期。
我想出了一个解决方案,但是需要更多数据才需要很长时间。代码如下:
get_count_function <- function(df) {
idx <- 1:nrow(df)
count <- sapply(idx, function(x) {
name_city <-
df %>% select(city) %>% filter(row_number() == x) %>% pull()
name_period <-
df %>% select(period_day) %>% filter(row_number() == x) %>% pull()
date_row <- df %>%
select(date) %>%
filter(row_number() == x) %>%
pull()
date_any_row <- df %>%
filter(dplyr::row_number() != x,
city == name_city,
period_day == name_period) %>%
select(date) %>%
pull()
how_many <- sum(date_row > date_any_row)
return(how_many)
})
return(count)
}
我怎样才能让这个功能更有效率?
【问题讨论】:
-
试试这个看看是否有效 - stackoverflow.com/questions/23528862/…
标签: r loops dplyr multiple-conditions