【问题标题】:List of Vectors and sequence padding向量列表和序列填充
【发布时间】:2018-11-07 00:52:16
【问题描述】:

我有一个包含三列的数据框

SentenceID = c(1,1,1,1,2,2,2,3,3,3,3,3,3,3,3)
Tokens = c("I","went","to","school","nobody","can","find","some","people","know","what","they","are","doing","now")
WordIndex = c(3,4,7,8,9,10,12,54,34,66,33,89,87,23,22)

df = data.frame(SentenceID, Tokens, WordIndex)

期望的结果:

我必须遍历每个 SentenceID 并创建一个向量 X 列表,如下所示

X           
[[1]]   3 4 7 8     
[[2]]   9 10 12     
[[3]]   54 34 66 33 89 87 23 22 

然后我需要用 0 填充它们 10 个点

X           
[[1]]   3 4 7 8 0 0 0 0 0 0 0       
[[2]]   9 10 12 0 0 0 0 0 0 0   
[[3]]   54 34 66 33 89 87 23 22 0 0 

请问我该如何实现?

【问题讨论】:

    标签: r vector sequence padding


    【解决方案1】:

    这是一种方法:

    > lapply(split(df$WordIndex, df$SentenceID), function(x) c(x, rep(0, pmax(10 - length(x), 0))))
    $`1`
     [1] 3 4 7 8 0 0 0 0 0 0
    
    $`2`
     [1]  9 10 12  0  0  0  0  0  0  0
    
    $`3`
     [1] 54 34 66 33 89 87 23 22  0  0
    

    【讨论】:

      【解决方案2】:

      aggregate 的基本 R 解决方案:

      lapply(aggregate(WordIndex, list(SentenceID), c)$x, 
          function(X) head(c(X, rep(0,10)), 10))
      $`1`
       [1] 3 4 7 8 0 0 0 0 0 0
      $`2`
       [1]  9 10 12  0  0  0  0  0  0  0
      $`3`
       [1] 54 34 66 33 89 87 23 22  0  0
      

      【讨论】:

        【解决方案3】:

        您可以使用purrrmap 函数尝试tidyverse

        library(tidyverse)
        df %>% 
          split(.$SentenceID) %>% 
          map(~.x$WordIndex %>% c(rep(0, 10-length(.))) %>% head(10))
        $`1`
         [1] 3 4 7 8 0 0 0 0 0 0
        
        $`2`
         [1]  9 10 12  0  0  0  0  0  0  0
        
        $`3`
         [1] 54 34 66 33 89 87 23 22  0  0
        

        【讨论】:

        • 如果一个 SentenceID 有超过 10 个单词,如何将前 10 个单词放入向量中。如果小于 10,则填充如上所示。这个论坛是一种祝福。谢谢 Jimbou,www,mt1022
        猜你喜欢
        • 2013-10-22
        • 1970-01-01
        • 2013-09-08
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-11-18
        • 1970-01-01
        • 2017-06-26
        相关资源
        最近更新 更多