【问题标题】:Generate new column based on values of other columns in R根据 R 中其他列的值生成新列
【发布时间】:2021-04-29 13:44:19
【问题描述】:

我有下表:

|  | Red | Green | Blue | Yellow | Brown | Purple | Black |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Apple | A | B | D | D | C | F | E |
| Pear | A | B | C | B | C | F | B |
| Orange | A | B | C | B | C | F | B |
| Strawberry | A | C | D | D | C | F | D |
| Lemon | E | C | D | D | C | F | D |

基于类似这样的样本数据:

输入数据

ID Colour Fruit
1 Red Apple
2 Red Orange
3 Green Lemon
4 Brown Strawberry
...
1000 Brown Strawberry

我想在输入数据中生成一个附加列(组),表示上表中的值,以便输出如下所示:

输出数据

ID Colour Fruit Group
1 Red Apple A
2 Red Orange A
3 Green Lemon C
4 Brown Strawberry F
...
1000 Brown Strawberry F

我看过这个问题:Generate new column values based on comparison of two other columns in R,这是对我的示例的过度简化,并且使用了 ifelse() 语句。

是否有另一种方法可以在数千行和可能的配对组合中执行此操作,而不是扩展 ifelse() 语句?

dplyr 包具有 mutate 和 filter 功能,但我不知道如何在此示例中组合它们。

【问题讨论】:

  • 给定数据表中的条目数因各行而异,例如Strawberry 行包含 6 个,而另一个包含 7 个。您能否以适当的格式提供示例数据?
  • 表已更正,因此所有行都有 7 个条目
  • 它没有达到您的目的吗?我制定了与您分享的示例完全相同的示例!

标签: r filter dplyr


【解决方案1】:

您应该使用我之前建议的方法。实际上,excel类型查找是在R中通过dplyrjoins进行的

table <- data.frame(
  stringsAsFactors = FALSE,
                      Fruit = c("Apple",
                                "Pear","Orange","Strawberry","Lemon"),
               Red = c("A", "A", "A", "A", "E"),
             Green = c("B", "B", "B", "C", "C"),
              Blue = c("D", "C", "C", "D", "D"),
            Yellow = c("D", "B", "B", "D", "D"),
             Brown = c("C", "C", "C", "C", "C"),
            Purple = c('F', 'F', 'F', 'F', 'F'),
             Black = c("E", "B", "B", "D", "D")
         )
table
#>        Fruit Red Green Blue Yellow Brown Purple Black
#> 1      Apple   A     B    D      D     C      F     E
#> 2       Pear   A     B    C      B     C      F     B
#> 3     Orange   A     B    C      B     C      F     B
#> 4 Strawberry   A     C    D      D     C      F     D
#> 5      Lemon   E     C    D      D     C      F     D

colors <- c("Red", "Green", "Blue", "Yellow", "Brown", "Purple", "Black")
fruits <- c("Apple", "Pear", "Orange", "Strawberry", "Lemon")

set.seed(1)
input_data <- data.frame(ID = 1:1000,
                         Color = sample(colors, 1000, T),
                         Fruit = sample(fruits, 1000, T))

head(input_data)
#>   ID  Color  Fruit
#> 1  1    Red  Lemon
#> 2  2 Yellow Orange
#> 3  3  Black  Lemon
#> 4  4    Red  Apple
#> 5  5  Green   Pear
#> 6  6  Brown Orange
library(dplyr)
library(tidyr)

output <- input_data %>% left_join(table %>% pivot_longer(!Fruit, names_to = "Color", values_to = 'Code'))
#> Joining, by = c("Color", "Fruit")

head(output)
#>   ID  Color  Fruit Code
#> 1  1    Red  Lemon    E
#> 2  2 Yellow Orange    B
#> 3  3  Black  Lemon    D
#> 4  4    Red  Apple    A
#> 5  5  Green   Pear    B
#> 6  6  Brown Orange    C

tail(output)
#>        ID  Color  Fruit Code
#> 995   995   Blue Orange    C
#> 996   996    Red Orange    A
#> 997   997 Yellow   Pear    B
#> 998   998    Red  Apple    A
#> 999   999   Blue   Pear    C
#> 1000 1000 Purple  Apple    F

reprex package 创建于 2021-04-30 (v2.0.0)

【讨论】:

    猜你喜欢
    • 2022-07-05
    • 2020-07-03
    • 1970-01-01
    • 2023-01-24
    • 1970-01-01
    • 2021-10-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多