【问题标题】:Create columns of data frame based on rows from another data frame根据来自另一个数据框的行创建数据框的列
【发布时间】:2015-10-29 14:34:25
【问题描述】:

所以正如标题所解释的,我想创建一个数据框。看一下将用作矩阵的向下:

structure(c("2", "3", "8", "8", "10", "10", "11", "11", "11", 
            "11", "Frank", "Mark", "Greg", "Mati", "Paul", 
            "Cyntha", "Marcus", "Pablo", "Maggy", "Trist"
), .Dim = c(10L, 2L), .Dimnames = list(NULL, c("i", "vec_names"
)))

所以,我想根据列i 中的值创建列。如果i 列中的数字相同,则意味着可以在下一列中找到的两个名称应存储在新数据框中的一列中。

当然,这意味着列的长度会有所不同,因此缺失的“字符串”可以用 NA 填充。

期望的输出:

2     3    8    10     11
Frank Mark Greg Paul   Marcus
           Mati Cyntha Pablo 
                       Maggy
                       Trist

【问题讨论】:

    标签: r


    【解决方案1】:

    你可以使用 reshape2 的dcast 来重塑宽:

    DF = data.frame(m)
    
    library(reshape2)
    DF$s <- ave(DF$i, DF$i, FUN = seq_along)
    res  <- dcast(DF, s ~ i, value.var = "vec_names")
    
      s     10     11     2    3    8
    1 1   Paul Marcus Frank Mark Greg
    2 2 Cyntha  Pablo  <NA> <NA> Mati
    3 3   <NA>  Maggy  <NA> <NA> <NA>
    4 4   <NA>  Trist  <NA> <NA> <NA>
    

    不幸的是,您有一个不需要的列s,其他列按字典顺序排列。如果你想解决这个问题:

    res$s <- NULL
    res[order(as.integer(names(res)))]
    
          2    3    8     10     11
    1 Frank Mark Greg   Paul Marcus
    2  <NA> <NA> Mati Cyntha  Pablo
    3  <NA> <NA> <NA>   <NA>  Maggy
    4  <NA> <NA> <NA>   <NA>  Trist
    

    【讨论】:

      【解决方案2】:

      在基础 R 中,首先将矩阵 (mymat) 转换为 data.frame,您可以尝试以下操作:

      df <- as.data.frame(mymat, stringsAsFactors=FALSE) # convert your df to a data.frame
      sp_df <- split(df, df$i) # split it according to "i"
      nb_row <- sapply(sp_df, nrow) # compute the number of rows in each so you can complete with NAs
      mapply(function(x, y) c(x$vec_names, rep(NA, max(nb_row)-y)), 
             x=sp_df, 
             y=nb_row) [, order(as.numeric(names(sp_df)))] # complete with NA when needed and keep only the second column. Finally, reorder the columns.
      

      编辑

      感谢@Frank,这是一种更简单的方法,只拆分名称的向量(在转换为 data.frame 之后):

      sp_nm = split(df$vec_names, df$i)
      do.call(cbind, lapply(sp_nm, `length<-`, max(lengths(sp_nm))))[, order(as.numeric(names(sp_nm)))]
      

      两种方式都给出以下输出

      #    2       3      8      10       11      
      #[1,] "Frank" "Mark" "Greg" "Paul"   "Marcus"
      #[2,] NA      NA     "Mati" "Cyntha" "Pablo" 
      #[3,] NA      NA     NA     NA       "Maggy" 
      #[4,] NA      NA     NA     NA       "Trist"
      

      【讨论】:

        【解决方案3】:

        试试 tidyr 包的传播功能。这将接近您的预期。

        spread(data.frame(
          structure(c("2", "3", "8", "8", "10", "10", "11", "11", "11", 
                                      "11", "Frank", "Mark", "Greg", "Mati", "Paul", 
                                      "Cyntha", "Marcus", "Pablo", "Maggy", "Trist"), 
                                    .Dim = c(10L, 2L), .Dimnames = list(NULL, c("i", "vec_names")))), 
          "i", "vec_names")
        
                       10     11     2    3    8
                1    <NA>   <NA> Frank <NA> <NA>
                2    <NA>   <NA>  <NA> Mark <NA>
                3    <NA>   <NA>  <NA> <NA> Greg
                4    <NA>   <NA>  <NA> <NA> Mati
                5    Paul   <NA>  <NA> <NA> <NA>
                6  Cyntha   <NA>  <NA> <NA> <NA>
                7    <NA> Marcus  <NA> <NA> <NA>
                8    <NA>  Pablo  <NA> <NA> <NA>
                9    <NA>  Maggy  <NA> <NA> <NA>
                10   <NA>  Trist  <NA> <NA> <NA>
        

        【讨论】:

        • 这看起来与 OP 的期望输出不太接近。
        猜你喜欢
        • 2022-01-18
        • 2017-07-05
        • 2019-03-19
        • 2020-03-20
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-04-02
        相关资源
        最近更新 更多