【问题标题】:Creating a column based on rows conditions根据行条件创建列
【发布时间】:2017-12-15 09:45:42
【问题描述】:

我有一个看起来像这样的数据集

user_id  Gap itr    visit_no.(desired column)
      a  0.3   1            1
      a  0.5   1            1
      a  1.5   1            1
      a  0.9   1            2
      a  2.6   1            2
      a 0.34   1            3
      a  0.8   2            1
      a 0.34   2            1
      b  1.6   1            1
      b  0.7   1            2
      b  0.8   1            2
      b  0.7   1            2
      b  4.8   2            1
      b 0.39   2            2
      b 0.38   2            2
      b 0.89   2            2

我想创建列 (visit_no.)。每当gap大于1时,我们需要增加visit_no。在下一行中将值加 1 并且在我们找到另一个值>1 之前它将保持不变,我们将继续按递增顺序分配序列号。如果间隙小于 1 ,则 visit_no 的值将在之前的 visit_no 行中给出。和visit_no。对于用户,总是从 1 开始,并且 itr ..Visit_No 列已按 user_id 和 itr 分组

这是数据框

df<-data.frame(user=c("a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b")
                    , gap=c(0.3,0.5,1.5,0.9,2.6,0.34,0.8,0.34,1.6,0.7,0.8,0.7,4.8,0.39,0.76,0.72),
                     itr=c(1,1,1,1,1,1,2,2,1,1,1,1,2,2,2,2))

【问题讨论】:

  • 您能修改一下您想要的栏目吗?包括所有值并确保它们正确
  • 所需的输出看起来不是很好的格式...第 6 行的间隙是 0.034 吗?第一列中的所有 a 和 b 是怎么回事?我不完全清楚你想做什么。
  • 是的第 6 行 Gap 为 0.34 .....数据格式正确

标签: r multiple-columns dplyr


【解决方案1】:
library(dplyr)

df<-data.frame(user=c("a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b")
               , gap=c(0.3,0.5,1.5,0.9,2.6,0.34,0.8,0.34,1.6,0.7,0.8,0.7,4.8,0.39,0.76,0.72),
               itr=c(1,1,1,1,1,1,2,2,1,1,1,1,2,2,2,2))

df %>%
  group_by(user, itr) %>%
  mutate(visit_no = cumsum(ifelse(lag(gap, default = 2) > 1, 1, 0))) %>%
  ungroup()

# # A tibble: 16 x 4
#     user   gap   itr visit_no
#   <fctr> <dbl> <dbl>    <dbl>
# 1      a  0.30     1        1
# 2      a  0.50     1        1
# 3      a  1.50     1        1
# 4      a  0.90     1        2
# 5      a  2.60     1        2
# 6      a  0.34     1        3
# 7      a  0.80     2        1
# 8      a  0.34     2        1
# 9      b  1.60     1        1
# 10     b  0.70     1        2
# 11     b  0.80     1        2
# 12     b  0.70     1        2
# 13     b  4.80     2        1
# 14     b  0.39     2        2
# 15     b  0.76     2        2
# 16     b  0.72     2        2

【讨论】:

    【解决方案2】:

    这与 AntoniosK 的答案几乎相同,但在 data.table 中,没有管道操作员,并且有 data.tableshift-function。

    library(data.table)
    dt <- data.table(df)
    dt[, visit_no := cumsum(ifelse(shift(gap, n = 1, type = "lag", fill = 0)>1,1,0)) + 1, by = c("user", "itr")]
    dt
    #    user  gap itr visit_no
    # 1:    a 0.30   1        1
    # 2:    a 0.50   1        1
    # 3:    a 1.50   1        1
    # 4:    a 0.90   1        2
    # 5:    a 2.60   1        2
    # 6:    a 0.34   1        3
    # 7:    a 0.80   2        1
    # 8:    a 0.34   2        1
    # 9:    b 1.60   1        1
    #10:    b 0.70   1        2
    #11:    b 0.80   1        2
    #12:    b 0.70   1        2
    #13:    b 4.80   2        1
    #14:    b 0.39   2        2
    #15:    b 0.76   2        2
    #16:    b 0.72   2        2
    

    【讨论】:

      猜你喜欢
      • 2016-08-16
      • 1970-01-01
      • 2020-10-03
      • 2020-10-02
      • 1970-01-01
      • 2023-01-12
      • 2020-09-12
      • 1970-01-01
      相关资源
      最近更新 更多