读取文本文件时超出 GC 开销限制错误答案

【问题标题】：GC Overhead limit exceeded error when reading a text file读取文本文件时超出 GC 开销限制错误
【发布时间】：2011-07-29 03:17:14
【问题描述】：

我收到 java.lang.OutOfMemoryError: 从文本文件读取时超出 GC 开销限制错误。我不确定出了什么问题。我在有足够内存的集群上运行我的程序。外部循环迭代16000 次，对于外循环的每次迭代，内循环迭代大约 300,000 次。当代码尝试从内循环读取一行时抛出错误。任何建议将不胜感激。以下是我的代码 sn- p：

//Read from the test data output file till not equals null
//Reads a single line at a time from the test data
while((line=br.readLine())!=null)
{
    //Clears the hashmap
    leastFive.clear();

    //Clears the arraylist
    fiveTrainURLs.clear();
    try
    {
        StringTokenizer st=new StringTokenizer(line," ");
        while(st.hasMoreTokens())
        {
            String currentToken=st.nextToken();

            if(currentToken.contains("File"))
            {
                testDataFileNo=st.nextToken();
                String tok="";
                while((tok=st.nextToken())!=null)
                {
                    if (tok==null) break;

                    int topic_no=Integer.parseInt(tok);
                    topic_no=Integer.parseInt(tok);
                    String prob=st.nextToken();

                    //Obtains the double value of the probability
                    double double_prob=Double.parseDouble(prob);
                    p1[topic_no]=double_prob;

                }
                break;
            }
        }
    }
    catch(Exception e)
    {
    }

    //Used to read over all the training data file
    FileReader fr1=new FileReader("/homes/output_train_2000.txt");

    BufferedReader br1=new BufferedReader(fr1);
    String line1="";

    //Reads the training data output file,one row at a time
    //This is the line on which an exception occurs!
    while((line1=br1.readLine())!=null)
    {
        try
        {
            StringTokenizer st=new StringTokenizer(line1," ");

            while(st.hasMoreTokens())
            {
                String currentToken=st.nextToken();

                if(currentToken.contains("File"))
                {
                    trainDataFileNo=st.nextToken();
                    String tok="";
                    while((tok=st.nextToken())!=null)
                    {
                        if(tok==null)
                            break;

                        int topic_no=Integer.parseInt(tok);
                        topic_no=Integer.parseInt(tok);
                        String prob=st.nextToken();

                        double double_prob=Double.parseDouble(prob);

                        //p2 will contain the probability values of each of the topics based on the indices
                        p2[topic_no]=double_prob;

                    }
                    break;
                }
            }
        }
        catch(Exception e)
        {
            double result=klDivergence(p1,p2);

            leastFive.put(trainDataFileNo,result);
        }
    }
}

【问题讨论】：

标签： java garbage-collection out-of-memory

【解决方案1】：

16000 * 300000 = 48 亿。如果每个令牌只占用 6 个字节，那么它本身就超过 24GB。当垃圾收集器最终启动到 24GB 的 gc 时，它会运行很长时间。似乎您需要将其分解为较小的块。您可以将您的应用程序内存限制为合理的值，例如 1GB，以便 GC 更快地启动并在它必须完成工作的时间内完成某些事情。

【讨论】：

另外，我相信 Windows 会忽略超过 1.2GB 的 vm 最大大小限制。