【问题标题】:Count observations with certain value in a group?计算一组中具有特定值的观察值?
【发布时间】:2020-12-20 17:05:17
【问题描述】:

我正在使用以下数据框:

Year  Month      Day   X      Y
2018  January    1     4.5    6
2018  January    4     3.2    8.1
2018  January    11    1.1    2.3
2018  February   7     5.4    2.2
2018  February   15    1.5    4.4
2019  January    3     8.6    2.3
2019  January    22    1.1    2.5
2019  January    23    5.5    7.8
2019  February   5     6.9    1.1
2019  February   10    1.8    1.3

我希望创建一个新列来指示给定月份 x 大于 y 的观察次数。

Year  Month      Day   X      Y       XGreaterThanYCount
2018  January    1     4.5    6             0
2018  January    4     3.2    8.1           0
2018  January    11    1.1    2.3           0
2018  February   7     5.4    2.2           1
2018  February   15    1.5    4.4           1
2019  January    3     8.6    2.3           1
2019  January    22    1.1    2.5           1
2019  January    23    5.5    7.8           1
2019  February   5     6.9    1.1           2
2019  February   10    1.8    1.3           2

我尝试执行逻辑测试df$XYTest <- df$X > df$Y,然后将其应用于 mutate

df <- df %>%
  group_by(Year, Month) %>%
  mutate(XGreaterThanYCount = count(XYTest = TRUE))

但我似乎无法让它发挥作用,而且我不确定这是一个好策略。

【问题讨论】:

  • mutate(XGreaterThanYCount = sum(X &gt; Y))

标签: r dataframe


【解决方案1】:
df <- df %>%
  group_by(Year, Month) %>%
  mutate(XGreaterThanYCount = sum(X > Y))

【讨论】:

  • last(cumsum(...)) 不是多余的,而 sum(...) 可以吗?
  • @AnilGoyal 这完美!谢谢!我不得不更改问题中的一些内容,因此如果您也想查看该问题,我发布了另一个类似的问题:stackoverflow.com/questions/65383809/…
【解决方案2】:

ave

dat <- transform(dat, XgreaterY=ave(X > Y, Year, Month, FUN=sum))
dat
#    Year    Month Day   X   Y XgreaterY
# 1  2018  January   1 4.5 6.0         0
# 2  2018  January   4 3.2 8.1         0
# 3  2018  January  11 1.1 2.3         0
# 4  2018 February   7 5.4 2.2         1
# 5  2018 February  15 1.5 4.4         1
# 6  2019  January   3 8.6 2.3         1
# 7  2019  January  22 1.1 2.5         1
# 8  2019  January  23 5.5 7.8         1
# 9  2019 February   5 6.9 1.1         2
# 10 2019 February  10 1.8 1.3         2

数据:

dat <- structure(list(Year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2019L, 
2019L, 2019L, 2019L, 2019L), Month = c("January", "January", 
"January", "February", "February", "January", "January", "January", 
"February", "February"), Day = c(1L, 4L, 11L, 7L, 15L, 3L, 22L, 
23L, 5L, 10L), X = c(4.5, 3.2, 1.1, 5.4, 1.5, 8.6, 1.1, 5.5, 
6.9, 1.8), Y = c(6, 8.1, 2.3, 2.2, 4.4, 2.3, 2.5, 7.8, 1.1, 1.3
)), class = "data.frame", row.names = c(NA, -10L))

【讨论】:

    猜你喜欢
    • 2019-06-03
    • 2022-12-06
    • 2018-01-12
    • 1970-01-01
    • 2022-01-18
    • 2020-03-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多