【发布时间】:2020-09-22 12:24:11
【问题描述】:
如何告诉 R (dplyr) “重置”过滤器,这将允许我在同一管道中进行第二次过滤? 否则,我将不得不为每个标识符号编写一个“for-loop”。 最小的工作示例突出了我面临的问题。
library(tidyverse)
data.tibble <- tribble( # sample data
~id,~year, ~identifier, ~items, ~cost,
10, 2018, "aaca" , 10, 25, # "aaca" toy cars
20, 2018, "aaca" , 12, 28, # "aaca" toy cars
10, 2018, "bbda" , 14, 30, # "bbda" pens
20, 2018, "bbda" , 27, 29, # "bbda" pens
)
a <-data.tibble %>% # FIRST BLOCK WORKS FINE on its own
group_by(id, year) %>%
filter(str_detect(identifier, "^a")) %>% # lookks for identifier that begins
summarise(toycars_sold=sum(items), # with "a"
toycars_cost=sum(cost))
a
b <- data.tibble %>% # Second block works fine on its own
group_by(id, year) %>%
filter(str_detect(identifier,"^b")) %>%
summarise(pens_sold=sum(items),
pens_cost=sum(cost))
b
我遇到了麻烦,如果我要求 dplyr 再次过滤同一管道中的不同标识符,则会收到错误消息
data.tibble %>%
group_by(id, year) %>%
filter(str_detect(identifier, "^a")) %>%
summarise(toycars_sold=sum(items),
toycars_cost=sum(cost)) %>%
filter(str_detect(identifier,"^b")) %>%
summarise(pens_sold=sum(items),
pens_cost=sum(cost))
What i would like to end up with is
c <- full_join(a,b)
There are a myriad of codes ("identifiers") that I will have to go through ( sometimes there is more than one identifier for a single item.
R 然后告诉我,找不到对象“标识符”。
非常感谢任何帮助。
老问题,有点难理解
我确实有一个问题,我似乎无法解决。这是我的问题,在调用第一个 summary() 函数后,如何告诉 tidyverse 重置过滤器。否则,我将不得不为我想要过滤的每个“id-code”(我相信正则表达式是正确的术语)创建一个“for-loop”。
output <- vector("list") # object to store output in
for (i in seq_along(object18)) { # object (list) to loop over, here items of stores in yr 18
output[[i]] <- object18[[i]] %>%
group_by(storeid, month, year, quarter) %>% # var list to group over
filter(str_detect(itemcode, "^CODE")) %>% # Code equals some identifiernr ("string")
summarize(toys=sum(items), # summarize
max.items.sold=max(items)) # summarize %>%
filter(str_detect(itemcode, "^NEWCODE, possibly multiple codes) %>% # FILTER OVER NEW CODE DOESN'T WORK
summarize(toys2=sum(items), # summarize
(itemstoy2=max(items)) # summarize
}
有人知道如何实现我的目标吗?
请不要对我苛刻,我对 R 很陌生。
在此先感谢大卫。
【问题讨论】:
-
也许您需要
group_by来处理每个项目代码?如果您提供示例数据(代码块中的dput(head(x)))和给定示例数据的预期输出,将会有所帮助。 -
+1 @r2evans 评论。我的直接反应是,如果您在循环中间过滤向量,那么您的问题很可能与您的数据格式有关。示例输入数据和所需输出将非常有帮助。
-
你知道来自 magrittr 的
%T>%管道吗?我没有解决方案,但我认为使用它可能是朝着你想要的方向迈出的一步......你可以在这里阅读一下r4ds.had.co.nz/pipes.html#