如何使用 Java 8 和 Streams 对一组文件中的文件进行计数和单词计数答案

【问题标题】：How to count files and count words in a set of files Using Java 8 and Streams如何使用 Java 8 和 Streams 对一组文件中的文件进行计数和单词计数
【发布时间】：2017-12-11 12:43:55
【问题描述】：

我过得很艰难。我正在尝试完成一项任务，其中访问文件目录并计算文件，然后读取文件本身并计算每个文件中的单词。这是我发布的问题的延续，但“答案”根本没有帮助解决我的问题（How to count words in a text file, java 8-style）

这是问题大纲：

编写一个程序，使用流有效地计算出现在一组文件（files.zip）中的不同长度的单词。您的输出如下所示：（计数仅用于说明目的）。

Count 11 files:
word length: 1 ==> 80
word length: 2 ==> 321
word length: 3 ==> 643

Instead, I got the following output:

primes.txt
Count: 1 files

这是我写的代码。我使用了两个类 FileReader，它是读取名为“Files:

的目录的主类

FileReader.java

    import java.io.IOException;
    import java.nio.file.DirectoryStream;
    import java.nio.file.Files;
    import java.nio.file.Path;
    import java.nio.file.Paths;
    import java.util.ArrayList;
    import java.util.List;

    /*
     * To change this license header, choose License Headers in Project Properties.
     * To change this template file, choose Tools | Templates
     * and open the template in the editor.
     */

    /**
     *
     * @author 
     */
    public class FileReader {

        public static void main(String args[]) {
            List<String> fileNames = new ArrayList<>();
            try {
                DirectoryStream<Path> directoryStream = Files.newDirectoryStream(Paths.get("files"));
                int fileCounter = 0;
                **WordReader wordCnt = new WordReader();**
                for (Path path : directoryStream) {
                    System.out.println(path.getFileName());
                    fileCounter++;
                    fileNames.add(path.getFileName().toString());
                    **System.out.println("word length: " + fileCounter + " ==> "
                            + wordCnt.count(path.getFileName().toString()));**
                }
            } catch (IOException ex) {
            }
            System.out.println("Count: " + fileNames.size() + " files");

        }
    }

还有 WordReader 类，理论上，它应该计算目录中每个文件中的单词。用 lambda 语法编写的类：

    import java.io.IOException;
    import java.nio.file.Files;
    import java.nio.file.Path;
    import java.nio.file.Paths;
    import java.util.AbstractMap.SimpleEntry;
    import java.util.Arrays;
    import java.util.Map;
    import static java.util.stream.Collectors.counting;
    import static java.util.stream.Collectors.groupingBy;

        /**
         *
         * @author 
         */
        public class WordReader {

            /**
             *
             * @param filename
             * @return
             * @throws java.io.IOException
             */
            public Map<String, Long> count(String filename) throws IOException {
                //Stream<String> lines = Files.lines(Paths.get(filename));
                Path path = Paths.get(":");
                Map<String, Long> wordMap = Files.lines(path)
                        .parallel()
                        .flatMap(line -> Arrays.stream(line.trim().split(" ")))
                        .map(word -> word.replaceAll("[^a-zA-Z]", "").toLowerCase().trim())
                        .filter(word -> word.length() > 0)
                        .map(word -> new SimpleEntry<>(word, 1))
                        //.collect(Collectors.toMap(s -> s, s -> 1, Integer::sum));
                        .collect(groupingBy(SimpleEntry::getKey, counting()));

                wordMap.forEach((k, v) -> System.out.println(String.format(k,v)));
                return wordMap;
            }
        }

我相信我在调用 WordReader 类时遇到问题（以 BOLD 突出显示）会停止计数器，但我不知道如何解决它，我已尝试移动该类调用for循环没有成功。如果我注释掉这些行，那么文件计数器运行得很好。有谁知道我可以做这个程序“走路（计数文件）和嚼口香糖（计数文件中的单词）”？

【问题讨论】：

您可能会遇到文件“：”不存在的异常，请检查Path path = Paths.get(":");。
我做到了，但即使我输入一个实际的文件 Ex: haiku.txt ，也会发生同样的事情。
打印你在main中捕获的异常，使用ex.printStackTrace();
如果我键入目录的路径，则会收到错误消息：线程“main”中的异常 java.io.UncheckedIOException: java.io.IOException: Is a directory
你为什么不用Path path = Paths.get(filename);？

标签： java file lambda java-8

【解决方案1】：

这是你犯的一些错误：

仅将文件名传递给count()，因为文件位于目录中，最好传递整个路径。
使用:的路径，即使它不是一个有效的文件名！
不记录抛出的异常，你隐藏了真正的问题。
当您仍然在与大多数 Java 语言作斗争时使用 lambda。

这应该可行：

主类：

public class Main {
    public static void main(String[] args) {
        List<String> fileNames = new ArrayList<>();
        try {
            DirectoryStream<Path> directoryStream = Files.newDirectoryStream(Paths.get("files"));
            int fileCounter = 0;
            WordReader wordCnt = new WordReader();
            for (Path path : directoryStream) {
                System.out.println(path.getFileName());
                fileCounter++;
                fileNames.add(path.getFileName().toString());
                System.out.println("word length: " + fileCounter + " ==> " + wordCnt.count(path));
            }
        } catch (IOException ex) {
            ex.printStackTrace();
        }
        System.out.println("Count: " + fileNames.size() + " files");
    }
}

WordReader.class：

public class WordReader {
    public Map<String, Integer> count(Path filePath) throws IOException {
        Map<String, Integer> wordMap = Files.lines(filePath)
                .flatMap(line -> Arrays.stream(line.trim().split(" ")))
                .map(word -> word.replaceAll("[^a-zA-Z]", "").toLowerCase().trim())
                .filter(word -> word.length() > 0)
                .collect(Collectors.groupingBy(s->s, Collectors.counting()));

        wordMap.forEach((k, v) -> System.out.println(String.format(k, v)));
        return wordMap;
    }
}

【讨论】：

您在此处添加 parallel 进行文件处理可能只会让事情变得更糟
@Eugene parallel 被删除，我只修改了代码中引起问题的部分。