【问题标题】:Label unique values by multiple groups in dataframe在数据框中按多个组标记唯一值
【发布时间】:2020-08-07 00:51:39
【问题描述】:

我在 R 中有一个大型数据框,其中用户的任务是描述场景中的对象。每个场景我需要唯一的 3 个用户,但是有些场景被描述了 3 次以上。我正在尝试保留前 3 个唯一用户并删除其余用户。

玩具数据(真实数据集有更多的行和列)

user <- c("A", "A", "A", "B", "B", "C", "C", "D", "E", "E", "F", "F", "F")
scene <- c("library", "library", "library", "park", "park", "library", "library", "park", "library", "library", "library", "library", "library")
object <- c("book", "book", "lamp", "dog", "cat", "book", "lamp", "dog", "desk", "desk", "book", "lamp", "lamp")
index <- c(1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2)
dat <- data.frame(user, scene, object, index)

user     scene      object      index
A        library    book        1
A        library    book        2
A        library    lamp        1
B        park       dog         1
B        park       cat         1
C        library    book        1
C        library    lamp        1
D        park       dog         1
E        library    desk        1
E        library    desk        2
F        library    book        1
F        library    lamp        1
F        library    lamp        2
...      ...        ...         ...

例如,这里ABC 是最早描述场景library 的用户。所以现在不需要F 的描述。我的主要问题是,虽然我可以获得唯一用户的总数,但我不知道如何将它们标记为 123 等,以便截断超过 3 的值。

期望的输出

user     scene      object      index   count
A        library    book        1       1
A        library    book        2       1
A        library    lamp        1       1
B        park       dog         1       1
B        park       cat         1       1
C        library    book        1       2
C        library    lamp        1       2
D        park       dog         1       2
E        library    desk        1       3
E        library    desk        2       3

这很有帮助,但只能按一列分组,所以我无法在此处应用它:R - Group by variable and then assign a unique ID

【问题讨论】:

    标签: r dataframe data-wrangling


    【解决方案1】:

    对于每个 user,您可以使用 match 创建一个 count 变量,然后使用 filter 输出值,直到 count &lt;= 3

    library(dplyr)
    
    dat %>%
      group_by(scene) %>%
      mutate(count = match(user, unique(user))) %>%
      filter(count <= 3)
    
    #   user  scene   object index count
    #   <chr> <chr>   <chr>  <dbl> <int>
    # 1 A     library book       1     1
    # 2 A     library book       2     1
    # 3 A     library lamp       1     1
    # 4 B     park    dog        1     1
    # 5 B     park    cat        1     1
    # 6 C     library book       1     2
    # 7 C     library lamp       1     2
    # 8 D     park    dog        1     2
    # 9 E     library desk       1     3
    #10 E     library desk       2     3
    

    data.table 中的相同是:

    library(data.table)
    setDT(dat)[, count := match(user, unique(user)), scene]
    dat[count <= 3] 
    

    和基础R:

    dat$count <- with(dat, ave(user, scene, FUN = function(x) match(x, unique(x))))
    subset(dat, count <= 3)
    

    【讨论】:

    • 这在 data.table 中会是什么样子?
    猜你喜欢
    • 2021-12-24
    • 2012-10-02
    • 1970-01-01
    • 2018-03-30
    • 1970-01-01
    • 2017-04-14
    • 2021-01-22
    • 2023-04-06
    • 2020-10-01
    相关资源
    最近更新 更多