【问题标题】:Create data.frame based on unique column values in R?根据 R 中的唯一列值创建 data.frame?
【发布时间】:2020-05-21 15:52:02
【问题描述】:

我有一个带有元数据列的data.frame 观察值,我想创建一个新的data.frame 具有相同的列,但每行代表每个列值的唯一组合。这是一个例子:

# what I have
df <- data.frame("Color" = c("Red", "Blue", "Green", "Green"), 
                 "Size" = c("Large", "Large", "Large", "Small"), 
                 "Value" = c(0, 1, 1, 1))
> df
  Color  Size Value
1   Red Large     0
2  Blue Large     1
3 Green Large     1
4 Green Small     1

# what I want
ideal_df <- data.frame("Color" = c("Red", "Red", "Red", "Red", "Blue", "Blue", "Blue", "Blue", "Green", "Green", "Green", "Green"), 
                       "Size" = c("Large", "Large", "Small", "Small", "Large", "Large", "Small", "Small", "Large", "Large", "Small", "Small"), 
                       "Value" = c(0,1,0,1,0,1,0,1,0,1,0,1))
> ideal_df
   Color  Size Value
1    Red Large     0
2    Red Large     1
3    Red Small     0
4    Red Small     1
5   Blue Large     0
6   Blue Large     1
7   Blue Small     0
8   Blue Small     1
9  Green Large     0
10 Green Large     1
11 Green Small     0
12 Green Small     1

我尝试过使用 for 循环,但是我的数据比这个例子大得多并且它挂起。我试图搜索这个问题,但找不到类似的东西。如果这已经得到回答,我很高兴看到其他线程!感谢您的宝贵时间。

【问题讨论】:

    标签: r dataframe unique


    【解决方案1】:

    这是expand() 的工作,来自tidyr 包:

    library(tidyr)
    
    new_df <- df %>% expand(Color, Size, Value)
    
    

    【讨论】:

      【解决方案2】:

      只需添加一个base R 解决方案:

      new_df <- expand.grid(Color = unique(df$Color)
                         , Size = unique(df$Size)
                         , Value = unique(df$Value))
      

      如果性能是一个问题,这里有一个基准比较:

      sandy <- function(){
        expand(df, Color, Size, Value)
      }
      
      cj <- function(){
        expand.grid(Color = unique(df$Color)
                    , Size = unique(df$Size)
                    , Value = unique(df$Value))
      }
      
      library(microbenchmark)
      microbenchmark(sandy(), cj())
      Unit: microseconds
          expr      min       lq      mean   median       uq      max neval
       sandy() 1382.524 1494.675 1693.1749 1562.084 1736.524 7352.916   100
          cj()  138.914  152.746  204.8588  173.321  191.910 2889.398   100
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2023-02-23
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多