【发布时间】:2014-12-24 02:58:09
【问题描述】:
我有一个关于将 json 转换为 csv 的问题——尤其是内存问题(至少我认为是一个)。我写了一些应该处理这种情况的函数,它工作得很好——对于小型 json 文件。对于大型 json 文件,JFrame 会卡住,几分钟内什么也没发生(我在约 5 分钟后用任务管理器终止了该进程)。 源 json 文件大约有 30.000 行。
我在做什么:
- 读取(大)json 文件
- 更正它(某些值不是典型的json,即
"actor" : "ObjectId("12345")等应更正为"actor" : "12345" - 将较大的 json 文件拆分为较小的文件。
- 处理小型 json 文件。
到目前为止我所拥有的:
public void mongoExportAndSplitFilter() {
ReadFileAndSave reader = new ReadFileAndSave();
String jsonFilePath = this.converterView.sourceTextField.getText();
//String targetFilePath = this.converterView.targetTextField.getText();
File jsonFile = new File(jsonFilePath);
Scanner scanner = new Scanner(reader.readFileAndCorrectOutput(jsonFile));
int j = 0;
StringBuffer sb = new StringBuffer();
reader.readPartOfFileAndSave("src/main/resources", scanner, j, sb);
//System.out.println("STEP 1: INPUT FILE (" + jsonFilePath + ") HAS BEEN CORRECTED!");
//System.out.println("STEP 2: INPUT FILE (" + jsonFilePath + ") HAS BEEN SPLITTED WHILE PARSING!");
this.filterView.setVisible(false);
this.filterView.dispose();
this.filterFlag = 1;
}
/**
* Utility function to correct the MongoExport-JSON-Output.
*
* @param file The file which should be corrected.
* @return Returns the correct JSON-String.
*/
public String readFileAndCorrectOutput(File file) {
String jsonStringCorrected = "";
StringBuffer sb = new StringBuffer();
try {
Scanner scanner = new Scanner(file);
while (scanner.hasNext()) {
String next = scanner.next();
if (next.contains("ObjectId") || next.contains("ISODate")) {
Matcher m = Pattern.compile(this.regEx)
.matcher(next);
if (m.find()) {
next = next.replaceAll(this.regEx, this.innerString);
}
}
//jsonStringCorrected += next;
sb.append(next);
}
scanner.close();
jsonStringCorrected = sb.toString();
JSONObject jsonObject = new JSONObject(jsonStringCorrected);
jsonStringCorrected = jsonObject.toString(2);
} catch (FileNotFoundException ex) {
Logger.getLogger(ReadFileAndSave.class.getName()).log(Level.SEVERE, null, ex);
}
return jsonStringCorrected;
}
/*
* Utility-function to read a json file part by part and save the parts to a separate json file.
* @param scanner The scanner which contains the file and which returns the lines from the file.
* @param j The counter of the file. As the file should change whenever the counter changes.
* @return jsonString The content of the jsonString.
*/
public String readPartOfFileAndSave(String filepath, Scanner scanner, int j, StringBuffer sb) {
String jsonString = "";
int i = 0;
++j;
while (scanner.hasNext()) {
String token = scanner.next();
//jsonString += token;
sb.append(token);
if (token.contains("{")) {
i++;
}
if (token.contains("}")) {
i--;
}
if (i == 0) {
jsonString = sb.toString();
JSONObject jsonObject = new JSONObject(jsonString);
jsonString = jsonObject.toString(2);
saveFile(filepath, "actor", j, jsonString);
jsonString = readPartOfFileAndSave(filepath, scanner, j);
}
}
return "";
}
有谁知道如何解决这个问题?
编辑
这是文件的 sn-p(前 3 行):
{ "verb" : "access", "target" : { "id" : "5485a7050ac61b1339a4da0e", "inquiryPhase" : "Orientation", "displayName" : "Orientation", "objectType" : "phase" }, "generator" : { "id" : "5485a7050ac61b1339a4da09", "displayName" : "LochemC", "objectType" : "ils", "url" : "http://graasp.eu/spaces/5485a7050ac61b1339a4da09" }, "provider" : { "id" : "5485a7050ac61b1339a4da09", "inquiryPhase" : "ils", "displayName" : "LochemC", "objectType" : "ils", "url" : "http://graasp.eu/spaces/5485a7050ac61b1339a4da09" }, "object" : { "id" : "5485a7050ac61b1339a4da09", "displayName" : "LochemC", "objectType" : "ils" }, "actor" : { "id" : "Bas Kollöffel (UT)@5485a7050ac61b1339a4da09", "displayName" : "Bas Kollöffel (UT)", "objectType" : "person" }, "published" : "2014-12-08T13:40:45.409Z", "publishedClient" : "2014-12-08T13:40:45.409Z", "publishedServer" : { "$date" : 1418046045490 }, "_id" : { "$oid" : "5485aa5dc372cdbb21daea33" } }
{ "verb" : "access", "target" : { "id" : "5485a7050ac61b1339a4da13", "inquiryPhase" : "Conceptualisation", "displayName" : "Conceptualisation", "objectType" : "phase" }, "generator" : { "id" : "5485a7050ac61b1339a4da09", "displayName" : "LochemC", "objectType" : "ils", "url" : "http://graasp.eu/spaces/5485a7050ac61b1339a4da09" }, "provider" : { "id" : "5485a7050ac61b1339a4da09", "inquiryPhase" : "ils", "displayName" : "LochemC", "objectType" : "ils", "url" : "http://graasp.eu/spaces/5485a7050ac61b1339a4da09" }, "object" : { "id" : "5485a7050ac61b1339a4da13", "inquiryPhase" : "Conceptualisation", "displayName" : "Conceptualisation", "objectType" : "phase" }, "actor" : { "id" : "Bas Kollöffel (UT)@5485a7050ac61b1339a4da09", "displayName" : "Bas Kollöffel (UT)", "objectType" : "person" }, "published" : "2014-12-08T13:40:46.867Z", "publishedClient" : "2014-12-08T13:40:46.867Z", "publishedServer" : { "$date" : 1418046046952 }, "_id" : { "$oid" : "5485aa5ec372cdbb21daea34" } }
{ "verb" : "access", "target" : { "id" : "5485a7050ac61b1339a4da1e", "inquiryPhase" : "Investigation", "displayName" : "Investigation", "objectType" : "phase" }, "generator" : { "id" : "5485a7050ac61b1339a4da09", "displayName" : "LochemC", "objectType" : "ils", "url" : "http://graasp.eu/spaces/5485a7050ac61b1339a4da09" }, "provider" : { "id" : "5485a7050ac61b1339a4da09", "inquiryPhase" : "ils", "displayName" : "LochemC", "objectType" : "ils", "url" : "http://graasp.eu/spaces/5485a7050ac61b1339a4da09" }, "object" : { "id" : "5485a7050ac61b1339a4da1e", "inquiryPhase" : "Investigation", "displayName" : "Investigation", "objectType" : "phase" }, "actor" : { "id" : "Bas Kollöffel (UT)@5485a7050ac61b1339a4da09", "displayName" : "Bas Kollöffel (UT)", "objectType" : "person" }, "published" : "2014-12-08T13:40:48.582Z", "publishedClient" : "2014-12-08T13:40:48.582Z", "publishedServer" : { "$date" : 1418046048662 }, "_id" : { "$oid" : "5485aa60c372cdbb21daea35" } }
【问题讨论】:
-
看起来像 sax/stax 类的 json 解析器对巨大的 json 很有用
-
第一个问题:你在循环中使用字符串连接。
-
@Jon Skeet:为什么这实际上是个问题?
-
@X-Fate:见yoda.arachsys.com/java/strings.html