(java) - 将输入文件中的每个单词存储在字符串数组中答案

【问题标题】：(java) - Storing each word from an input file in an array of Strings(java) - 将输入文件中的每个单词存储在字符串数组中
【发布时间】：2015-02-21 08:18:10
【问题描述】：

在编写方法来完成此操作时遇到问题，具有该方法的基本轮廓，但只需要一些指针/帮助来完成此操作。

  public static String [] readFileAndReturnWords(String filename){
     //create array
     //read one word at a time from file and store in array
     //return the array
  }

这是我目前所拥有的：

public static String readFileAndReturnWords(String filename){   
      String[] temp = new String[];

      //connects file
      File file = new File(filename);
      Scanner inputFile = null;

     try{

          inputFile = new Scanner(file);

         }
          //When arg is mistyped
      catch(FileNotFoundException Exception1) {
          System.out.println("File not found!");
          System.exit(0);      
     }


     //Loops through a file
    if (inputFile != null) {

    try { //I draw a blank here

我知道一些 .next 和 .hasNext 调用是有序的，我只是不确定如何在问题的上下文中使用这些特定方法。

【问题讨论】：

这就是文档变得有用的地方：你不知道如何使用它们，所以你阅读了他们的文档，然后你就会知道更多：docs.oracle.com/javase/7/docs/api/java/util/Scanner.html
@JBNizet 是的，但是在这个问题的上下文中，我很难理解这些特定方法的语法和其他类似的东西。阅读 oracle 文档为我提供了一些背景信息，但不一定能帮助我理解语法或如何真正将其应用于任何问题。也许无法应用我从文档中读到的内容可能是因为我缺乏编程经验。
编程最酷的地方在于你可以尝试并犯错误。 hasNext() 在还有令牌时返回 true。 next() 使用下一个令牌并返回它。您想读取每个令牌，因此您需要一个循环。当不再有令牌时，循环应该停止。如果没有令牌，hasNext() 返回 false。这应该足以至少尝试一些东西。

标签： java arrays

【解决方案1】：

拆分成单个单词实际上比最初看起来要复杂一些 - 你是根据什么拆分的？

如果你按空格分割，那么句号、逗号和其他标点符号最终会附加到一个单词上，所以

快点，懒狗。

将分为：

快，
懒惰
狗。

这可能是也可能不是您想要的。如果您拆分非单词字符，那么您最终会拆分撇号、连字符等，所以：

不能，不会 ->
1. 可以
2. t
3. 赢了
4. t
没有人怀疑超空间
1. 没有
2. 一个
3. 嫌疑人
4. 超级
5. 空间

因此，这些解决方案各有各的问题。我建议使用word boundary 正则表达式匹配器。它有点复杂，但仍然存在问题 - 尝试不同的方法，看看是什么产生了你需要的输出。

我提出的解决方案使用 Java 8：

public static String[] readFileAndReturnWords(String filename) throws IOException {
    final Path path = Paths.get(filename);
    final Pattern pattern = Pattern.compile("\\b");

    try (final Stream<String> lines = Files.lines(path)) {
        return lines.flatMap(pattern::splitAsStream).toArray(String[]::new);
    }
}

因此，首先您将String 转换为Path，这是文件位置的Java NIO 表示。然后你创建你的Pattern，这决定了如何分解单词。

您如何简单地使用Files.lines 流式传输文件中的所有行，然后使用Pattern.splitAsStream 将每一行转换为单词。我们使用flatMap，因为我们需要“展平”流，即每一行都是Stream<String>，我们已经有了Stream<String>，所以我们最终得到了Stream<Stream<String>>。 flatMap 旨在获取 Stream<Stream<T>> 并返回 Stream<T>。

【讨论】：

【解决方案2】：

将其存储在 ArrayList 中，因为您不知道文件中存储了多少单词。

public class Test
{
  static ArrayList<String> words;
  public static void main(String[] args) throws FileNotFoundException
  {
    Scanner s = new Scanner(new File("Blah.txt"));
    words = new ArrayList<String>();
    while(s.hasNext ())
    {
      String token = s.next ();
      if(isAWord(token))
      {
        if(token.contains ("."))
        {
         token =  token.replace (".","");
        }
        if(token.contains (","))
        {
          token = token.replace (",", "");
        }
        //and remove other characters like braces and parenthesis 
        //since the scanner gets tokens like
        // here we are, < "are," would be a token
        //
        words.add(token);
      }

    }

  }

  private static boolean isAWord(String token)
  {
    //check if the token is a word
  }
}

它应该可以工作。

如果你真的想使用一个数组，你可以把你的 ArrayList 转换成一个简单的 Array by

String[] wordArray = words.toArray();

【讨论】：

toArray() 不会编译。您需要将类型传递为Sting[] wordArray = words.toArray(new String[0]);