【问题标题】:K-Skip-N-Gram: generalization of for-loops in RK-Skip-N-Gram:R 中 for 循环的泛化
【发布时间】:2013-08-15 18:21:58
【问题描述】:

我有一个 R 函数来生成 K-Skip-N-Grams:
我的完整功能可以在github找到。

我的代码确实正确生成了所需的 k-skip-ngram:

> kSkipNgram("Lorem ipsum dolor sit amet, consectetur adipiscing elit.", n=2, skip=1)
 [1] "Lorem dolor"            "Lorem ipsum"            "ipsum sit"             
 [4] "ipsum dolor"            "dolor amet"             "dolor sit"             
 [7] "sit consectetur"        "sit amet"               "amet adipiscing"       
[10] "amet consectetur"       "consectetur elit"       "consectetur adipiscing"
[13] "adipiscing elit"       

但我想概括/简化以下嵌套 for 循环的 switch 语句:

# x - should be text, sentense
# n - n-gramm
# skip - number of skips
###################################
  switch(as.character(n),
         "0" = {ngram<-c(ngram, paste(x[i]))},
         "1" = {for(j in skip:1)
                  {
                    if (i+j <= length(x)) 
                      {ngram<-c(ngram, paste(x[i],x[i+j]))}
                  }
                },
         "2" = {for(j in skip:1)
                  {for (k in skip:1)
                    {
                      if (i+j <= length(x) && i+j+k <= length(x)) 
                        {ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k]))}
                    }
                  }
                },
         "3" = {for(j in skip:1)
                  {for (k in skip:1)
                    {for (l in skip:1)
                      {
                      if (i+j <= length(x) && i+j+k <= length(x) && i+j+k+l <= length(x)) 
                          {ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k],x[i+j+k+l]))}
                      }
                    }
                  }
                },
         "4" = {for(j in skip:1)
                  {for (k in skip:1)
                      {for (l in skip:1)
                        {for (m in skip:1)
                            {
                            if (i+j <= length(x) && i+j+k <= length(x) && i+j+k+l <= length(x) && i+j+k+l+m <= length(x)) 
                                  {ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k],x[i+j+k+l],x[i+j+k+l+m]))}
                            }
                        }
                      }
                    }
                  }
        )
  }
}

【问题讨论】:

    标签: r for-loop switch-statement n-gram


    【解决方案1】:

    我对一般的 k-skip-n-grams 使用了递归解决方案。我已将它包含在 Python 中;我对 R 没有经验,但希望你能翻译它。我使用了这篇论文中的定义: http://homepages.inf.ed.ac.uk/ballison/pdf/lrec_skipgrams.pdf

    如果您要在长句子上使用它,这可能应该通过一些动态编程进行优化,因为它目前有很多冗余计算(重复计算子图)。我也没有彻底测试过,可能存在极端情况。

    def kskipngrams(sentence,k,n):
        "Assumes the sentence is already tokenized into a list"
        if n == 0 or len(sentence) == 0:
            return None
        grams = []
        for i in range(len(sentence)-n+1):
            grams.extend(initial_kskipngrams(sentence[i:],k,n))
        return grams
    
    def initial_kskipngrams(sentence,k,n):
        if n == 1:
            return [[sentence[0]]]
        grams = []
        for j in range(min(k+1,len(sentence)-1)):
            kmjskipnm1grams = initial_kskipngrams(sentence[j+1:],k-j,n-1)
            if kmjskipnm1grams is not None:
                for gram in kmjskipnm1grams:
                    grams.append([sentence[0]]+gram)
        return grams
    

    【讨论】:

      猜你喜欢
      • 2011-11-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-08-29
      • 1970-01-01
      • 1970-01-01
      • 2021-04-16
      • 2021-09-24
      相关资源
      最近更新 更多