检查文件中单词列表的最有效方法答案

【问题标题】：Most Efficient Way to Check File for List of Words检查文件中单词列表的最有效方法
【发布时间】：2011-08-13 13:40:42
【问题描述】：

我刚刚完成了一项家庭作业，要求我将所有 Java 关键字添加到 HashSet。然后读入一个 .java 文件，并计算任何关键字在 .java 文件中出现的次数。

我走的路线是：创建了一个包含所有关键字的 String[] 数组。创建了一个 HashSet，并使用 Collections.addAll 将数组添加到 HashSet。然后当我遍历文本文件时，我会通过 HashSet.contains(currentWordFromFile);

有人建议使用 HashTable 来执行此操作。然后我看到了一个使用 TreeSet 的类似示例。我只是好奇..推荐的方法是什么？

（此处完整代码：http://pastebin.com/GdDmCWj0）

【问题讨论】：

标签： java hashtable hashset treeset

【解决方案1】：

你说“有一个家庭作业”，所以我假设你已经完成了。

我会做一些不同的事情。首先，我认为您的 String 数组中的某些关键字不正确。根据Wikipedia和Oracle，Java有50个关键字。无论如何，我已经很好地评论了我的代码。这是我想出的...

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Map;
import java.util.HashMap;

public class CountKeywords {

    public static void main(String args[]) {

        String[] theKeywords = { "abstract", "assert", "boolean", "break", "byte", "case", "catch", "char", "class", "const", "continue", "default", "do", "double", "else", "enum", "extends", "false", "final", "finally", "float", "for", "goto", "if", "implements", "import", "instanceof", "int", "interface", "long", "native", "new", "null", "package", "private", "protected", "public", "return", "short", "static", "strictfp", "super", "switch", "synchronized", "this", "throw", "throws", "transient", "true", "try", "void", "volatile", "while" };

        // put each keyword in the map with value 0 
        Map<String, Integer> theKeywordCount = new HashMap<String, Integer>();
        for (String str : theKeywords) {
            theKeywordCount.put(str, 0);
        }

        FileReader fr;
        BufferedReader br;
        File file = new File(args[0]);

        // attempt to open and read file
        try {
            fr = new FileReader(file);
            br = new BufferedReader(fr);

            String sLine;

            // read lines until reaching the end of the file
            while ((sLine = br.readLine()) != null) {

                // if an empty line was read
                if (sLine.length() != 0) {

                    // extract the words from the current line in the file
                    if (theKeywordCount.containsKey(sLine)) {
                        theKeywordCount.put(sLine, theKeywordCount.get(sLine) + 1);
                    }
                }
            }

        } catch (FileNotFoundException exception) {
            // Unable to find file.
            exception.printStackTrace();
        } catch (IOException exception) {
            // Unable to read line.
            exception.printStackTrace();
        } finally {
                br.close();
            }

        // count how many times each keyword was encontered
        int occurrences = 0;
        for (Integer i : theKeywordCount.values()) {
            occurrences += i;
        }

        System.out.println("\n\nTotal occurences in file: " + occurrences);
    }
}

每次遇到文件中的关键字时，我首先检查它是否在 Map 中；如果不是，它不是一个有效的关键字；如果是，则更新与关键字关联的值，即，我将关联的 Integer 增加 1，因为我们再次看到了此关键字。

或者，您可以摆脱最后一个 for 循环并保持运行计数，这样您就可以...

if (theKeywordCount.containsKey(sLine)) {
    occurrences++;
}

...你在最后打印出计数器。

我不知道这是否是最有效的方法，但我认为这是一个良好的开端。

如果您有任何问题，请告诉我。我希望这会有所帮助。
赫里斯托

【讨论】：

Hristo，在我查看你所有的代码之前，是的，家庭作业已经完成。还有，为什么我有50多个关键字，指定的作业我们也应该包括3个保留字；假，空，真..我忘了提。谢谢你的帖子。我要通读一遍，看看你现在做事的方式。我非常高兴看到有人将获得更多编程经验的方法来处理任务。
1.明白了。我会将这些添加到我的列表中。 2. 我没有“更多的编程经验”。我还是个大学生 :) 3. 祝你好运！如果您有任何问题，请告诉我您的想法。与此同时，我要去睡觉了。
br.close 属于 finally 语句。 java.util.Scanner 的使用似乎远没有 br(fr(file)) 复杂。 cmets 是噪音（FileNotFoundException exception) {// Unable to find file. 或// Unable to read line。而不是if (a == 0) {/*empty*/} else ...，只需写if (a != 0) { ... 。
感谢您的建议。关于Scanner，根据我的经验，它比使用FileReader 和BufferedReader 慢

【解决方案2】：

试试Map<String, Integer>，其中字符串是单词，整数是单词被看到的次数。

这样做的一个好处是您不需要处理文件两次。

【讨论】：

似乎这样可以更轻松地单独计算特定关键字。鉴于我不需要单独计算每个关键字，您认为我这样做的方式有什么缺点吗？
启动只包含关键字的地图，每个关键字的值都为 0。调用 Map.get 来获取值，如果它返回一个非空值，则增加它并重新存储。如果它为 null 则无事可做，因为它不是关键字。