从文本文件读取到字符串数组答案

【问题标题】：reading from text file to string array从文本文件读取到字符串数组
【发布时间】：2013-01-09 00:08:32
【问题描述】：

所以我可以在我的文本文件中搜索一个字符串，但是，我想对这个 ArrayList 中的数据进行排序并实现一个算法。是否可以从文本文件中读取并将文本文件中的值 [Strings] 存储在 String[] 数组中。

也可以分开字符串吗？所以不是我的数组有：

[Alice was beginning to get very tired of sitting by her sister on the, bank, and of having nothing to do:]

是否可以将数组设为：

["Alice", "was" "beginning" "to" "get"...]

    public static void main(String[]args) throws IOException
    {
        Scanner scan = new Scanner(System.in);
        String stringSearch = scan.nextLine();

        BufferedReader reader = new BufferedReader(new FileReader("File1.txt"));
        List<String> words = new ArrayList<String>();

        String line;
        while ((line = reader.readLine()) != null) {                
            words.add(line);
        }

        for(String sLine : words) 
        {
            if (sLine.contains(stringSearch)) 
            {
                int index = words.indexOf(sLine);
                System.out.println("Got a match at line " + index);

            }
         }

        //Collections.sort(words);
        //for (String str: words)
        //      System.out.println(str);

        int size = words.size();
        System.out.println("There are " + size + " Lines of text in this text file.");
        reader.close();

        System.out.println(words);

    }

【问题讨论】：

标签： java arrays algorithm sorting

【解决方案1】：

要将一行拆分为一个单词数组，请使用：

String words = sentence.split("[^\\w']+");

正则表达式[^\w'] 表示“不是单词字符或撇号”

这将捕获带有嵌入撇号的单词，例如“can't”，并跳过所有标点符号。

编辑：

评论提出了将引用的单词（例如 'this'）解析为 this 的极端情况。
这是解决方案 - 您必须首先删除包装引号：

String[] words = input.replaceAll("(^|\\s)'([\\w']+)'(\\s|$)", "$1$2$3").split("[^\\w']+");

这是一些带有边缘和角落案例的测试代码：

public static void main(String[] args) throws Exception {
    String input = "'I', ie \"me\", can't extract 'can't' or 'can't'";
    String[] words = input.replaceAll("(^|[^\\w'])'([\\w']+)'([^\\w']|$)", "$1$2$3").split("[^\\w']+");
    System.out.println(Arrays.toString(words));
}

输出：

[I, ie, me, can't, extract, can't, or, can't]

【讨论】：

如果我必须得到像 'here' 这样的单引号字符串怎么办？
@smit 使用split() 无法满足这种极端情况，因为 split 指定了在这些词之间的内容，而这必须检查周围的内容 的话。您必须首先删除这些撇号。请参阅编辑后的答案。
我问你这个是出于好奇，我从来没有打算攻击。我真的很想知道是否可以仅使用正则表达式来完成。 我的道歉。
@Bohemian 深知冬季狂欢已经结束 - 但in the quest of proving a high performing answer, is this useful or not? :)

【解决方案2】：

也可以分开字符串吗？ 是的，你可以用这个来分割字符串。

 String[] strSplit;
 String str = "This is test for split";
 strSplit = str.split("[\\s,;!?\"]+");

See String API

此外，您还可以逐字阅读文本文件。

 Scanner scan = null;
 try {
     scan = new Scanner(new BufferedReader(new FileReader("Your File Path")));
 } catch (FileNotFoundException e) {
     e.printStackTrace();
 }

 while(scan.hasNext()){
     System.out.println( scan.next() ); 
 }

See Scanner API

【讨论】：

单词之间的逗号等怎么样？您的拆分不适合标点符号，例句包含（所以它甚至不是理论上的）
你可以像这样在 split 中列出所有必要的停止字符：str.split([\\s,.!\\?]*)