【问题标题】:How to split lines from bufferedReader into words如何将 bufferedReader 中的行拆分为单词
【发布时间】:2017-07-31 20:55:55
【问题描述】:

我需要帮助来创建拆分代码行的代码,然后它可以进行一些拼写检查。

  public static void main(String [] args) throws IOException {
    Stem myStem = new Stem();

    BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(new FileInputStream("C:\\Users\\lamrh\\IdeaProjects\\untitled1\\src\\bigON\\data.txt")));

    //String currentWord = String.valueOf(bufferedReader.readLine());
    Scanner scanner = new Scanner(bufferedReader.readLine());
    //byte[] data = new byte [currentWord.length()];
    String[] splitLines;
    //splitLines = splitLines.split(" ");


    String line;
    while((line = bufferedReader.readLine()) !=null  ){
        //splitLines = line.split(" ");
        String currentWord1 = formatWordGhizou ( line);
        System.out.println(""+ line+""+ ":"+ currentWord1);

    }
    bufferedReader.close();


}

结果告诉我:

سْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم

سْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم ِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم ِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم ِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم ِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم ِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم ِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيمِ

它应该看起来像一个字一个字而不是一行字。 任何帮助 谢谢。

【问题讨论】:

标签: java split bufferedreader


【解决方案1】:

在您的 while 循环中尝试将行字符串连接成行,使用正则表达式拆分行以填充字符串数组 splitLines,然后遍历数组 splitLines 以将元素发送到标准输出,如下所示 (adapted from helpful tutorial at this link)

String lines="";

while((line = bufferedReader.readLine()) !=null  ){

   lines = lines  + line;  

}

String[] splitLines = lines.split("\\s+");

for (String words: splitLines) {

     System.out.println(words);

  }

【讨论】:

    【解决方案2】:
    // format the word by removing any punctuation, diacritics and non-letter charracters
    private static String formatWordGhizou ( String currentWord )
    {
        StringBuffer modifiedWord = new StringBuffer ( );
    
    
        // remove any diacritics (short vowels)
        if ( removeDiacritics( currentWord, modifiedWord ) )
        {
            currentWord = modifiedWord.toString ( );
        }
    
        // remove any punctuation from the word
        if ( removePunctuation( currentWord, modifiedWord ) )
        {
            currentWord = modifiedWord.toString ( ) ;
        }
    
        // there could also be characters that aren't letters which should be removed
        if ( removeNonLetter ( currentWord, modifiedWord ) )
        {
            currentWord = modifiedWord.toString ( );
        }
    
        // check for stopwords
        if( !checkStrangeWords ( currentWord ) )
            // check for stopwords
            if( !checkStopwords ( currentWord ) )
                currentWord = stemWord ( currentWord );
    
        return currentWord;
    }
    
    //-----------------
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-01-30
      相关资源
      最近更新 更多