【问题标题】:Update column values randomly based on value in other column in R根据 R 中其他列中的值随机更新列值
【发布时间】:2020-07-03 06:45:13
【问题描述】:

我想添加一个新列SubCategory,其值根据Category 列的值随机填充。详情如下:

Sub_Hair = c("Shampoo", "Conditioner", "Gel", "HairOil", "Dye")
Sub_Beauty = c("Face", "Eye", "Lips")
Sub_Nail= c("NailPolish", "NailPolishRemover", "NailArtKit", "ManiPadiKit")
Sub_Others = c("Electric", "NonElectric")

> product_data_1[1:10, c("Pcode", "Category", "MRP")]
    Pcode Category    MRP
1  16156L   Beauty  $8.88
2  16162M   Others $21.27
3  16168M   Others  $2.98
4  16169E     Nail $26.64
5  16207A     Hair  $6.38
6  17012B   Beauty $33.03
7  17012C   Beauty $20.58
8  17012F   Beauty $36.29
9  17091A     Nail $20.55
10 17107D     Nail $28.20

我正在尝试下面的代码。但是,行正在更新,每个类别只有一个子类别。例如,所有具有“美容”类别的行,子类别都是“眼睛”,而不是从“面部、眼睛和嘴唇”中随机选择的值。这是代码和输出:

product_data_1 = within(product_data_1, SubCategory[Category == "Beauty"] <- sample(Sub_Beauty, 1))
product_data_1 = within(product_data_1, SubCategory[Category == "Hair"] <- sample(Sub_Hair, 1))
product_data_1 = within(product_data_1, SubCategory[Category == "Nail"] <- sample(Sub_Nail, 1))
product_data_1 = within(product_data_1, SubCategory[Category == "Others"] <- sample(Sub_Others, 1))

> product_data_1[1:10, c("Pcode", "Category", "MRP", "SubCategory")]
    Pcode Category    MRP SubCategory
1  16156L   Beauty  $8.88         Eye
2  16162M   Others $21.27    Electric
3  16168M   Others  $2.98    Electric
4  16169E     Nail $26.64  NailPolish
5  16207A     Hair  $6.38         Gel
6  17012B   Beauty $33.03         Eye
7  17012C   Beauty $20.58         Eye
8  17012F   Beauty $36.29         Eye
9  17091A     Nail $20.55  NailPolish
10 17107D     Nail $28.20  NailPolish

【问题讨论】:

    标签: r replace conditional-statements


    【解决方案1】:

    将您的子类别值放入subcat_list &lt;- list(Hair = Hair, Beauty = Beauty, Nail = Nail, Others = Others) 之类的列表中。然后,您可以使用product_data_1$Categorysubcat_listsapply 进行切片,以在结果向量列表的每个元素上调用sample

    set.seed(323)
    product_data_1$SubCategory <- sapply(subcat_list[product_data_1$Category], sample, 1)
    

    您也可以尝试使用 dplyr + purrr 的稍微不同的方法:

    library(tidyverse)
    product_data_1 %>% 
        mutate(SubCategory = map_chr(Category, ~ sample(subcat_list[[.]], 1)))
    

    示例输出:

        Pcode Category    MRP SubCategory
    1  16156L   Beauty  $8.88         Eye
    2  16162M   Others $21.27    Electric
    3  16168M   Others  $2.98    Electric
    4  16169E     Nail $26.64  NailPolish
    5  16207A     Hair  $6.38         Gel
    6  17012B   Beauty $33.03         Eye
    7  17012C   Beauty $20.58        Lips
    8  17012F   Beauty $36.29        Face
    9  17091A     Nail $20.55 ManiPadiKit
    10 17107D     Nail $28.20  NailArtKit
    

    【讨论】:

    • 感谢您提供多种选择。我在安装 tidyverse 时遇到了一些问题,但已修复。这两个选项都工作正常。
    【解决方案2】:

    这是一个基本的 R 解决方案。它使用 Hadley Wickham 在 JSS article 中解释的拆分/应用/组合策略。

    我会将Sub_* 向量放入一个列表Sub_list。请注意,split 将按Category 对结果进行排序,因此Sub_list 列表中的向量也必须按顺序排列。

    Sub_list <- list(Sub_Beauty, Sub_Hair, Sub_Nail, Sub_Others)
    sp <- split(product_data_1, product_data_1$Category)
    
    set.seed(1234)
    sp <- lapply(seq_along(sp), function(i){
      sp[[i]]$SubCategory <- sample(Sub_list[[i]], nrow(sp[[i]]), replace = TRUE)
      sp[[i]]
    })
    result <- do.call(rbind, sp)
    result <- result[order(as.integer(row.names(result))), ]
    result
    #    Pcode Category    MRP       SubCategory
    #1  16156L   Beauty  $8.88               Eye
    #2  16162M   Others $21.27       NonElectric
    #3  16168M   Others  $2.98       NonElectric
    #4  16169E     Nail $26.64        NailPolish
    #5  16207A     Hair  $6.38           Shampoo
    #6  17012B   Beauty $33.03               Eye
    #7  17012C   Beauty $20.58              Face
    #8  17012F   Beauty $36.29              Lips
    #9  17091A     Nail $20.55 NailPolishRemover
    #10 17107D     Nail $28.20       ManiPadiKit
    

    最后清理。

    rm(Sub_list)
    

    数据

    product_data_1 <- read.table(text = "
        Pcode Category    MRP
    1  16156L   Beauty  $8.88
    2  16162M   Others $21.27
    3  16168M   Others  $2.98
    4  16169E     Nail $26.64
    5  16207A     Hair  $6.38
    6  17012B   Beauty $33.03
    7  17012C   Beauty $20.58
    8  17012F   Beauty $36.29
    9  17091A     Nail $20.55
    10 17107D     Nail $28.20
    ", header = TRUE)
    

    【讨论】:

    • 工作正常。谢谢
    猜你喜欢
    • 2021-10-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-08-18
    • 1970-01-01
    • 2022-07-05
    • 1970-01-01
    相关资源
    最近更新 更多