【问题标题】:Relaxed JSON parsing with Java使用 Java 轻松解析 JSON
【发布时间】:2021-03-10 17:41:23
【问题描述】:

我们有一个大型 JSON 文件,其中 NV 对中的某些值包含额外的 " 字符,这导致 Java 解析失败。我已经能够使用以下示例 JSON 文件重新创建问题。

{
  "sample-data": [
    {
      "name": "Peter Smith",
      "comment": "A quick brown fox jumps over the lazy dog"
    },
    {
      "name": "John Doe",
      "comment": "This is so cool"
    },
    {
      "name": "Amy Brown",
      "comment": "He just exclaimed "OMG" when I approached him"
    },
    {
      "name": "Ronnie Arbuckle",
      "comment": "Peter O"Toole is a great bloke"
    }
  ]
}

这里是 JSON 对象

{
  "name": "Amy Brown",
  "comment": "He just exclaimed "OMG" when I approached him"
}

{
  "name": "Ronnie Arbuckle",
  "comment": "Peter O"Toole is a great bloke"
}

是那些有额外的" 问题。


问):有没有办法用 Java 执行“轻松的 JSON 解析”?
我们可以接受在此过程中丢失某些对象的数据,但我们希望尽可能多地挽救数据。

【问题讨论】:

  • 不能转义 cmets 中的引号吗? “Peter O\”Toole 是个好人”?让 JSON 格式正确且可解析?
  • @Heiko Jakubzik JSON 文件传给我们。我们不会从头开始创建它。
  • 告诉那些将 JSON 文件交给你的人来修复他们的错误代码。
  • 文件的格式是否如图所示?因为那样你就可以转义除每行的第一个、第二个、第三个和最后一个之外的每个双引号......
  • @Siguza 这只是我为说明问题而创建的一个示例文件。

标签: java json


【解决方案1】:

如果您需要稍微修改原始文件的内容,我建议您编写一个自定义包装器来实现 java.io.Reader/java.io.InputStream 并将此阅读器/流传递给您的解析库。这个包装器会即时修改内容。例如:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;

public class JsonFixer {
    private static final String CORRUPTED_JSON = "{\n" +
            "  \"name\": \"Amy Brown\",\n" +
            "  \"value\": 123,\n" +
            "  \"comment\": \"He just exclaimed \"OMG\" when I approached him\",\n" +
            "  \"comment\": \"He just exclaimed \\\"OMG\" when I approached him\",\n" +
            "  \"comment\": \"He just exclaimed \\\"OMG\\\" when I approached him\"\n" +
            "}";

    public static class FixingReader extends Reader {
        private final StringBuilder fixedLine = new StringBuilder();

        private final BufferedReader lineReader;

        private char[] currentLine;
        private int currentLineStart;
        private int currentLineLength;

        public FixingReader(final Reader reader) {
            if (reader instanceof BufferedReader) {
                lineReader = (BufferedReader) reader;
            } else {
                lineReader = new BufferedReader(reader);
            }
        }

        @Override
        public int read(final char[] cbuf, final int off, final int len) throws IOException {
            if (currentLineLength > 0) { // make the read of the rest of line
                final int left = currentLineLength - currentLineStart;
                final int read = Math.min(len, left);
                System.arraycopy(currentLine, currentLineStart, cbuf, off, read);
                currentLineStart += read;
                if (currentLineStart == currentLineLength) {
                    currentLineStart = 0;
                    currentLineLength = 0;
                }
                return read;
            }

            final String line = lineReader.readLine();
            if (line == null) { // EOF
                currentLineStart = 0;
                currentLineLength = 0;
                return -1;
            }

            int lineLength = line.length() + 1; // including \n on the end of the line to be restored
            currentLine = currentLine == null || currentLine.length < lineLength ?
                    new char[lineLength] :
                    currentLine; // reuse if we have enough space
            line.getChars(0, line.length(), currentLine, 0);
            currentLine[lineLength - 1] = '\n';
            fixedLine.setLength(0);

            // find the opening quotation mark
            int openQuoteIdx = -1;
            int qtCnt = 0;
            for (int i = 0; i < lineLength; i++) {
                final char c = currentLine[i];
                fixedLine.append(c); // write start of the line
                if (c != '"') {
                    continue;
                }
                qtCnt++;
                if (qtCnt == 3) {
                    openQuoteIdx = i;
                    break;
                }
            }
            // find the closing quotation mark
            int closeQuoteIdx = -1;
            for (int i = lineLength - 1; i > 0; i--) {
                if (currentLine[i] != '"') {
                    continue;
                }
                closeQuoteIdx = i;
                break;
            }
            if (openQuoteIdx > -1) { // if the line has quotation marks for the value
                // copy the rest of the string replacing the quotation mark
                boolean wasQuoted = false;
                for (int i = openQuoteIdx + 1; i < lineLength; i++) {
                    final char c = currentLine[i];
                    if (i >= closeQuoteIdx) {
                        fixedLine.append(c); // write end of the line
                        continue;
                    }
                    // can see a quotation mark
                    switch (c) {
                        case '\\':
                            wasQuoted = true;
                            break;
                        case '"':
                            if (!wasQuoted) {
                                fixedLine.append('\\');
                            }
                        default:
                            wasQuoted = false;
                    }
                    fixedLine.append(c);
                }
                if (fixedLine.length() > lineLength) {
                    currentLine = new char[fixedLine.length()];
                    fixedLine.getChars(0, fixedLine.length(), currentLine, 0);
                    lineLength = currentLine.length;
                }
            }

            currentLineStart = 0;
            currentLineLength = lineLength;

            // make the read
            final int read = Math.min(len, currentLineLength);
            System.arraycopy(currentLine, currentLineStart, cbuf, off, read);
            currentLineStart += read;
            if (currentLineStart == currentLineLength) {
                currentLineLength = 0;
            }
            return read;
        }

        @Override
        public void close() throws IOException {
            lineReader.close();
        }
    }

    public static void main(String[] args) throws Exception {
        try (BufferedReader fixedJson = new BufferedReader(new FixingReader(new StringReader(CORRUPTED_JSON)))) {
            fixedJson.lines().forEach(System.out::println);
        }
    }
}

打印以下输出:

{
  "name": "Amy Brown",
  "value": 123,
  "comment": "He just exclaimed \"OMG\" when I approached him",
  "comment": "He just exclaimed \"OMG\" when I approached him",
  "comment": "He just exclaimed \"OMG\" when I approached him"
}

这种轻量级的方法甚至允许您转换大文件,因为它一次只需要存储一行。此特定实现仅适用于一行包含不超过一个对象属性及其值的情况。否则,您必须正确修改解析。

【讨论】:

    猜你喜欢
    • 2011-11-10
    • 1970-01-01
    • 2014-11-19
    • 2011-07-11
    • 1970-01-01
    • 1970-01-01
    • 2011-04-10
    • 2013-03-15
    • 1970-01-01
    相关资源
    最近更新 更多