【发布时间】:2020-08-11 17:25:00
【问题描述】:
我有以 ISO-8859-1 编码的文件。我正在尝试将其作为单个字符串读入,对其进行一些正则表达式替换,然后以相同的编码将其写回。
但是,我得到的结果文件似乎总是 UTF-8(至少根据 Notepad++),会损坏一些字符。
谁能看到我在这里做错了什么?
private static void editFile(File source, File target) {
// Source and target encoding
Charset iso88591charset = Charset.forName("ISO-8859-1");
// Read the file as a single string
String fileContent = null;
try (Scanner scanner = new Scanner(source, iso88591charset)) {
fileContent = scanner.useDelimiter("\\Z").next();
} catch (IOException exception) {
LOGGER.error("Could not read input file as a single String.", exception);
return;
}
// Do some regex substitutions on the fileContent string
String newContent = regex(fileContent);
// Write the file back out in target encoding
try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), iso88591charset))) {
writer.write(newContent);
} catch (Exception exception) {
LOGGER.error("Could not write out edited file!", exception);
}
}
【问题讨论】:
-
我没有看到任何明显的错误,但我有两个建议:直接使用StandardCharsets.ISO_8859_1,也许你不需要
BufferedWriter,你可以使用相同的write(String)方法OutputStreamWriter.
标签: java encoding io file-handling