【发布时间】:2017-11-27 19:17:51
【问题描述】:
我正在实现一个应该接收大文本文件的类。我想将它分成块,每个块由一个不同的线程保存,该线程将计算这个块中每个字符的频率。我希望启动更多线程以获得更好的性能,但事实证明性能越来越差。这是我的代码:
public class Main {
public static void main(String[] args)
throws IOException, InterruptedException, ExecutionException, ParseException
{
// save the current run's start time
long startTime = System.currentTimeMillis();
// create options
Options options = new Options();
options.addOption("t", true, "number of threads to be start");
// variables to hold options
int numberOfThreads = 1;
// parse options
CommandLineParser parser = new DefaultParser();
CommandLine cmd;
cmd = parser.parse(options, args);
String threadsNumber = cmd.getOptionValue("t");
numberOfThreads = Integer.parseInt(threadsNumber);
// read file
RandomAccessFile raf = new RandomAccessFile(args[0], "r");
MappedByteBuffer mbb
= raf.getChannel().map(FileChannel.MapMode.READ_ONLY, 0, raf.length());
ExecutorService pool = Executors.newFixedThreadPool(numberOfThreads);
Set<Future<int[]>> set = new HashSet<Future<int[]>>();
long chunkSize = raf.length() / numberOfThreads;
byte[] buffer = new byte[(int) chunkSize];
while(mbb.hasRemaining())
{
int remaining = buffer.length;
if(mbb.remaining() < remaining)
{
remaining = mbb.remaining();
}
mbb.get(buffer, 0, remaining);
String content = new String(buffer, "ISO-8859-1");
@SuppressWarnings("unchecked")
Callable<int[]> callable = new FrequenciesCounter(content);
Future<int[]> future = pool.submit(callable);
set.add(future);
}
raf.close();
// let`s assume we will use extended ASCII characters only
int alphabet = 256;
// hold how many times each character is contained in the input file
int[] frequencies = new int[alphabet];
// sum the frequencies from each thread
for(Future<int[]> future: set)
{
for(int i = 0; i < alphabet; i++)
{
frequencies[i] += future.get()[i];
}
}
}
}
//help class for multithreaded frequencies` counting
class FrequenciesCounter implements Callable
{
private int[] frequencies = new int[256];
private char[] content;
public FrequenciesCounter(String input)
{
content = input.toCharArray();
}
public int[] call()
{
System.out.println("Thread " + Thread.currentThread().getName() + "start");
for(int i = 0; i < content.length; i++)
{
frequencies[(int)content[i]]++;
}
System.out.println("Thread " + Thread.currentThread().getName() + "finished");
return frequencies;
}
}
【问题讨论】:
-
您的硬件每秒只能从磁盘传输这么多字节。您要求阅读多少并不重要。
-
磁盘不是多线程的。你的期望是错误的。
-
那么,如果我将每个块保存为不同的文件,然后将每个文件传递给线程,它会变得更好吗?
-
@barni 没有。你仍然会有一个磁盘。这可能会让事情变得更糟。
-
所以有点理论:如果你的代码本身除了等待网络或硬盘之类的开销之外没有瓶颈,那么它被称为I/O绑定。在这一点上,让它运行得更快的唯一方法是改进连接到机器本身的硬件。或者开始水平扩展以利用更多独立的机器。
标签: java multithreading future callable mappedbytebuffer