如何在java中将文件读入字符串？答案

【问题标题】：How to read a file into string in java?如何在java中将文件读入字符串？
【发布时间】：2010-12-12 00:34:33
【问题描述】：

我已将文件读入字符串。该文件包含各种名称，每行一个名称。现在的问题是我想把这些名字放在一个字符串数组中。

为此，我编写了以下代码：

String [] names = fileString.split("\n"); // fileString is the string representation of the file

但是我没有得到想要的结果，拆分字符串后得到的数组长度为1。这意味着“fileString”没有“\n”字符，但文件有这个“\n”字符.

那么如何解决这个问题呢？

【问题讨论】：

为什么要保留\n。你不能假设它在那里吗？

标签： java

【解决方案1】：

使用Apache Commons（Commons IO 和Commons Lang）怎么样？

String[] lines = StringUtils.split(FileUtils.readFileToString(new File("...")), '\n');

【讨论】：

+1 - 用一行代码换取对 Apache Commons IO 和 Lang 的依赖。

【解决方案2】：

问题不在于你如何分割字符串；那一点是正确的。

您必须查看如何将文件读取到字符串。你需要这样的东西：

private String readFileAsString(String filePath) throws IOException {
        StringBuffer fileData = new StringBuffer();
        BufferedReader reader = new BufferedReader(
                new FileReader(filePath));
        char[] buf = new char[1024];
        int numRead=0;
        while((numRead=reader.read(buf)) != -1){
            String readData = String.valueOf(buf, 0, numRead);
            fileData.append(readData);
        }
        reader.close();
        return fileData.toString();
    }

【讨论】：

虽然正确，但我对任何看到这个的人有一个警告：我不会使用这个确切的代码 sn-p，因为如果抛出 IOException，阅读器永远不会关闭并可能导致 FileReaders 挂起这永远不会被垃圾收集，这在 *nix 世界中意味着您最终会用完文件句柄并且您的 JVM 只是简单地崩溃。
另一个问题是FileReader 隐含地选择了默认的字符集。中间的String 也是不必要的。
StringBuilder 可能是比 StringBuffer 更好的选择。来自 StringBuffer javadoc：“从 JDK 5 开始，这个类已经补充了一个为单线程使用而设计的等效类 StringBuilder。通常应该优先使用 StringBuilder 类，因为它支持所有相同的操作但它更快，因为它不执行同步。”

【解决方案3】：

根据Garrett Rowe and Stan James 的建议，您可以使用java.util.Scanner：

try (Scanner s = new Scanner(file).useDelimiter("\\Z")) {
  String contents = s.next();
}

或

try (Scanner s = new Scanner(file).useDelimiter("\\n")) {
  while(s.hasNext()) {
    String line = s.next();
  }
}

这段代码没有外部依赖。

警告：您应该将字符集编码指定为扫描仪构造函数的第二个参数。在这个例子中，我使用的是平台的默认值，但这肯定是错误的。

这是一个如何使用 java.util.Scanner 的示例，并进行正确的资源和错误处理：

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.Iterator;

class TestScanner {
  public static void main(String[] args)
    throws FileNotFoundException {
    File file = new File(args[0]);

    System.out.println(getFileContents(file));

    processFileLines(file, new LineProcessor() {
      @Override
      public void process(int lineNumber, String lineContents) {
        System.out.println(lineNumber + ": " + lineContents);
      }
    });
  }

  static String getFileContents(File file)
    throws FileNotFoundException {
    try (Scanner s = new Scanner(file).useDelimiter("\\Z")) {
      return s.next();
    }
  }

  static void processFileLines(File file, LineProcessor lineProcessor)
    throws FileNotFoundException {
    try (Scanner s = new Scanner(file).useDelimiter("\\n")) {
      for (int lineNumber = 1; s.hasNext(); ++lineNumber) {
        lineProcessor.process(lineNumber, s.next());
      }
    }
  }

  static interface LineProcessor {
    void process(int lineNumber, String lineContents);
  }
}

【讨论】：

+1 用于最简单的原生解决方案。顺便说一句，不要忘记使用scanner.close();防止资源泄漏
@mmdemirbas，好的，我添加了一个包含资源和错误处理的完整示例。感谢您的警告。
扫描仪在读取与预期不同的编码时有一个讨厌的错误，请参阅：stackoverflow.com/questions/8330695/…
@golimar，错误在我自己的代码中：我应该将字符集指定为扫描仪构造函数的第二个参数，而不是依赖默认字符集。
@golimar 同意：扫描仪中的错误报告是错误的。但是使用错误的编码来读取文件是我自己的代码中的一个错误。如何在不知道字符编码的情况下阅读一些文本？

【解决方案4】：

我特别喜欢使用java.nio.file 包的这个，也描述了here。

您可以选择将 Charset 作为第二个参数包含在 String 构造函数中。

 String content = new String(Files.readAllBytes(Paths.get("/path/to/file")));

酷哈！

【讨论】：

这可能是最好的答案！！
在大多数情况下，我会在 byte->char 转换中添加一个特定的字符集：new String(..., someCharset)。
是的，没错。例如new String("", StandardCharsets.UTF_8)

【解决方案5】：

您可以将文件读入List 而不是String，然后转换为数组：

//Setup a BufferedReader here    
List<String> list = new ArrayList<String>();
String line = reader.readLine();
while (line != null) {
  list.add(line);
  line = reader.readLine();
}
String[] arr = list.toArray(new String[0]);

【讨论】：

或者甚至将其保留为数组。
或者完全不处理文件

【解决方案6】：

Java 中没有可以读取整个文件的内置方法。所以你有以下选择：

使用非标准库方法，例如Apache Commons，请参阅 romaintaz 答案中的代码示例。
循环使用一些read 方法（例如FileInputStream.read，读取字节，或FileReader.read，读取字符；两者都读取到预分配的数组）。这两个类都使用系统调用，因此如果您一次只读取少量数据（例如，少于 4096 字节），则必须通过缓冲（BufferedInputStream 或 BufferedReader）来加速它们。李>
循环BufferedReader.readLine。有一个基本问题是它会丢弃文件末尾是否有'\n' 的信息——例如它无法区分空文件和仅包含换行符的文件。

我会使用这个代码：

// charsetName can be null to use the default charset.
public static String readFileAsString(String fileName, String charsetName)
    throws java.io.IOException {
  java.io.InputStream is = new java.io.FileInputStream(fileName);
  try {
    final int bufsize = 4096;
    int available = is.available();
    byte[] data = new byte[available < bufsize ? bufsize : available];
    int used = 0;
    while (true) {
      if (data.length - used < bufsize) {
        byte[] newData = new byte[data.length << 1];
        System.arraycopy(data, 0, newData, 0, used);
        data = newData;
      }
      int got = is.read(data, used, data.length - used);
      if (got <= 0) break;
      used += got;
    }
    return charsetName != null ? new String(data, 0, used, charsetName)
                               : new String(data, 0, used);
  } finally {
    is.close();
  }
}

上面的代码有以下优点：

正确：它读取整个文件，不丢弃任何字节。
它允许您指定文件使用的字符集（编码）。
速度很快（无论文件包含多少换行符）。
不会浪费内存（无论文件包含多少换行符）。

【讨论】：

【解决方案7】：

FileReader fr=new FileReader(filename);
BufferedReader br=new BufferedReader(fr);
String strline;
String arr[]=new String[10];//10 is the no. of strings
while((strline=br.readLine())!=null)
{
arr[i++]=strline;
}

【讨论】：

【解决方案8】：

在不使用第三方库的情况下逐行读取文本文件并将结果放入字符串数组的最简单解决方案是：

ArrayList<String> names = new ArrayList<String>();
Scanner scanner = new Scanner(new File("names.txt"));
while(scanner.hasNextLine()) {
    names.add(scanner.nextLine());
}
scanner.close();
String[] namesArr = (String[]) names.toArray();

【讨论】：

【解决方案9】：

我总是这样用：

String content = "";
String line;
BufferedReader reader = new BufferedReader(new FileReader(...));
while ((line = reader.readLine()) != null)
{
    content += "\n" + line;
}
// Cut of the first newline;
content = content.substring(1);
// Close the reader
reader.close();

【讨论】：

仅供参考：您通常使用该代码读取小文件吗？我本以为所有字符串连接都会对性能产生重大影响...我并不是要消极，我只是好奇。
嗯，是的...这种方法是否已弃用？哦，仅供参考是什么意思？
FYI = 供您参考，网络上使用的众多常用首字母缩略词之一。
为什么要收集到一个字符串而不是每行一个字符串的列表？之后您通常需要对收集到的数据进行处理。
我猜亚当指出的问题是您在循环中执行字符串连接 +=，这意味着您每次都创建一个新的字符串对象（因为字符串是不可变的）。这对性能有很大的负面影响。使用 StringBuilder（并执行 append()）而不是字符串作为内容。

【解决方案10】：

一种更简单（没有循环）、但不太正确的方法是将所有内容读取到一个字节数组中：

FileInputStream is = new FileInputStream(file);
byte[] b = new byte[(int) file.length()];  
is.read(b, 0, (int) file.length());
String contents = new String(b);

另请注意，这存在严重的性能问题。

【讨论】：

【解决方案11】：

如果你只有 InputStream，你可以使用 InputStreamReader。

SmbFileInputStream in = new SmbFileInputStream("smb://host/dir/file.ext");
InputStreamReader r=new InputStreamReader(in);
char buf[] = new char[5000];
int count=r.read(buf);
String s=String.valueOf(buf, 0, count);

如果需要，您可以添加循环和 StringBuffer。

【讨论】：

【解决方案12】：

您还可以使用java.nio.file.Files 将整个文件读入字符串列表，然后您可以将其转换为数组等。假设一个名为 filePath 的字符串变量，以下两行将执行此操作：

List<String> strList = Files.readAllLines(Paths.get(filePath), Charset.defaultCharset());
String[] strarray = strList.toArray(new String[0]);

【讨论】：

【解决方案13】：

你可以试试Cactoos:

import org.cactoos.io.TextOf;
import java.io.File;
new TextOf(new File("a.txt")).asString().split("\n")

【讨论】：

【解决方案14】：

@Anoyz 答案的固定版本：

import java.io.FileInputStream;
import java.io.File;

public class App {
public static void main(String[] args) throws Exception {

    File f = new File("file.txt");
    long fileSize = f.length();

    String file = "test.txt";

    FileInputStream is = new FileInputStream("file.txt");
    byte[] b = new byte[(int) f.length()];  
    is.read(b, 0, (int) f.length());
    String contents = new String(b);
}
}

【讨论】：