【问题标题】:Stanford Training Lambda Too Big斯坦福培训 Lambda 太大
【发布时间】:2016-07-11 16:59:48
【问题描述】:

我正在使用 Stanford POS Tagger 来训练语料库。我准备了设置文件“Prop”并格式化数据并开始训练。

在那之后,我开始收到诸如“Lambda Too Big”之类的消息,并且这些消息一直出现,直到培训结束。之后我尝试了该模型并一直给我一个“内存不足异常”。我在具有超过 40gb 内存的 HPC 上厌倦了模型,并增加了 java 的堆空间以使用 25gb,但同样的问题仍然存在。

我使用的语料库大约有 6000 个句子,一个句子中的最小单词数是 3,最大是 128 个单词。一个词可以同时包含三个标签,例如 {p1}{p2}。

当我开始训练时,这里是日志:

 pcond initialized
 zlambda initialized
 ftildeArr initialized
QNMinimizer called on double function of 337720 variables, using M = 10.

Iter. 0: neg. log cond. likelihood = 821394.2976644086 [1 calls to valueAt]
          An explanation of the output:
Iter           The number of iterations
evals          The number of function evaluations
SCALING        <D> Diagonal scaling was used; <I> Scaled Identity
LINESEARCH     [## M steplength]  Minpack linesearch
                   1-Function value was too high
                   2-Value ok, gradient positive, positive curvature
                   3-Value ok, gradient negative, positive curvature
                   4-Value ok, gradient negative, negative curvature
               [.. B]  Backtracking
VALUE          The current function value
TIME           Total elapsed time
|GNORM|        The current norm of the gradient
{RELNORM}      The ratio of the current to initial gradient norms
AVEIMPROVE     The average improvement / current value
EVALSCORE      The last available eval score

Iter ## evals ## <SCALING> [LINESEARCH] VALUE TIME |GNORM| {RELNORM} AVEIMPROVE
EVALSCORE

Iter 1 evals 1 <D> [lambda 5525 too big: 623.532051211901
lambda 28341 too big: 623.5660256059567
lambda 153849 too big: 623.5660256059567

另外,这里是 prop 文件中使用的设置:

## tagger training invoked at Thu Mar 03 01:31:10 AST 2016 with arguments:
                   model = arabic.New.tagger
                    arch = words(-2,2),order(1),prefix(6),suffix(6),unicodeshapes(1)
            wordFunction = 
               trainFile = format=TSV,Train.txt
         closedClassTags = 
 closedClassTagThreshold = 40
 curWordMinFeatureThresh = 1
                   debug = false
             debugPrefix = 
            tagSeparator = /
                encoding = UTF-8
              iterations = 100
                    lang = arabic
    learnClosedClassTags = false
        minFeatureThresh = 3
           openClassTags = 
rareWordMinFeatureThresh = 3
          rareWordThresh = 5
                  search = qn
                    sgml = false
            sigmaSquared = 0.0
                   regL1 = 0.75
               tagInside = 
                tokenize = false
        tokenizerFactory = edu.stanford.nlp.process.WhitespaceTokenizer
        tokenizerOptions = 
                 verbose = false
          verboseResults = true
    veryCommonWordThresh = 250
                xmlInput = 
              outputFile = 
            outputFormat = slashTags
     outputFormatOptions = 
                nthreads = 1

谁能告诉我我做错了什么?

【问题讨论】:

    标签: lambda stanford-nlp part-of-speech


    【解决方案1】:

    关于 Lambda 大小消息,您可以在这里找到答案:Lambda Size is Too Bog

    对于Out Of Memory Exception,请指定您要标记的文件的大小。无论如何,我怀疑您正在尝试为标记器传递一个大文件。尝试通过100 KB 的文件作为测试。如果文件的大小很小,我认为您不会收到错误消息。但是,如果Out Of Memory Exception 错误消息一直显示,那么您可以在此处向java-nlp-user 提问:java-nlp-user。请注意,在发布任何问题之前,您必须订阅该列表。

    希望对你有帮助!

    【讨论】:

    • 我会尝试您的建议并回复...谢谢。
    猜你喜欢
    • 2015-09-11
    • 1970-01-01
    • 2016-04-19
    • 1970-01-01
    • 1970-01-01
    • 2017-11-27
    • 1970-01-01
    • 1970-01-01
    • 2019-03-17
    相关资源
    最近更新 更多