【发布时间】:2015-03-20 20:57:59
【问题描述】:
我一直在做一个作业,我必须从文件中读取单词并找到最长的单词并检查该最长单词中包含多少子单词? 这应该适用于文件中的所有单词。
我尝试使用 java,我编写的代码适用于文件中的少量数据,但我的任务是处理大量数据。
示例: 文件词:"call","me","later","hey","how","callmelater","now","iam","busy","noway","nowiambusy"
o/p: callmelater : subwords->call,me,later
在此我正在读取存储在链接列表中的文件单词,然后找到最长的单词并将其从列表中删除,然后检查提取的单词包含多少子单词。
主类作业:
import java.util.Scanner;
public class Assignment {
public static void main (String[] args){
long start = System.currentTimeMillis();;
Assignment a = new Assignment();
a.throwInstructions();
Scanner userInput = new Scanner(System.in);
String filename = userInput.nextLine();
// String filename = "ab.txt";
// String filename = "abc.txt";
Logic testRun = new Logic(filename);
// //testRun.result();
long end = System.currentTimeMillis();;
System.out.println("Time taken:"+(end - start) + " ms");
}
public void throwInstructions(){
System.out.println("Keep input file in same directory, where the code is");
System.out.println("Please specify the fie name : ");
}
用于处理的子类逻辑:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class Logic {
private String filename;
private File file;
private List<String> words = new LinkedList<String>();
private Map<String, String> matchedWords = new HashMap();
@Override
public String toString() {
return "Logic [words=" + words + "]";
}
// constructor
public Logic(String filename) {
this.filename = filename;
file = new File(this.filename);
fetchFile();
run();
result();
}
// find the such words and store in map
public void run() {
while (!words.isEmpty()) {
String LongestWord = extractLongestWord(words);
findMatch(LongestWord);
}
}
// find longest word
private String extractLongestWord(List<String> words) {
String longWord;
longWord = words.get(0);
int maxLength = words.get(0).length();
for (int i = 0; i < words.size(); i++) {
if (maxLength < words.get(i).length()) {
maxLength = words.get(i).length();
longWord = words.get(i);
}
}
words.remove(words.indexOf(longWord));
return longWord;
}
// find the match for word in array of sub words
private void findMatch(String LongestWord) {
boolean chunkFound = false;
int chunkCount = 0;
StringBuilder subWords = new StringBuilder();
for (int i = 0; i < words.size(); i++) {
if (LongestWord.indexOf(words.get(i)) != -1) {
subWords.append(words.get(i) + ",");
chunkFound = true;
chunkCount++;
}
}
if (chunkFound) {
matchedWords.put(LongestWord,
"\t" + (subWords.substring(0, subWords.length() - 1))
+ "\t:Subword Count:" + chunkCount);
}
}
// fetch data from file and store in list
public void fetchFile() {
String word;
try {
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
while ((word = br.readLine()) != null) {
words.add(word);
}
fr.close();
br.close();
} catch (FileNotFoundException e) {
// e.printStackTrace();
System.out
.println("ERROR: File -> "
+ file.toString()
+ " not Exists,Please check filename or location and try again.");
} catch (IOException e) {
// e.printStackTrace();
System.out.println("ERROR: Problem reading -> " + file.toString()
+ " File, Some problem with file format.");
}
}
// display result
public void result() {
Set set = matchedWords.entrySet();
Iterator i = set.iterator();
System.out.println("WORD:\tWORD-LENGTH:\tSUBWORDS:\tSUBWORDS-COUNT");
while (i.hasNext()) {
Map.Entry me = (Map.Entry) i.next();
System.out.print(me.getKey() + ": ");
System.out.print("\t" + ((String) me.getKey()).length() + ": ");
System.out.println(me.getValue());
}
}
}
这是我的程序缺乏的地方,并且进入了一些永无止境的循环。 我的程序的复杂性很高。 为了减少处理时间,我需要一种有效的方法,例如二进制/合并排序方法,这将花费最少的时间,例如 O(log n) 或 O(nlog n)。
如果有人可以帮助我解决这个问题,或者至少建议我应该朝哪个方向前进。另外请建议我哪种编程语言可以快速实现此类文本处理任务?
提前致谢
【问题讨论】:
-
谷歌
radix tree。一个好的数据结构非常重要。 -
C++ may actually be slower。当然,他不太可能切换,因为这是一项任务。
-
这可以通过动态规划来解决。按长度对单词进行排序。取第一个字。获取第一个字符 1,然后检查其余字母 n-1 是否可以从单词列表中构造出来。再次从 n-1 个字母中取出 1 个字符,然后检查是否可以从单词列表中形成 n-2 个字母。
-
@Sandeep 对单词进行排序是开销。没有必要排序。跟踪最长的单词更简单。
-
@KonsolLabapen OP 说他想找到最长的单词。