【发布时间】:2015-08-29 00:57:11
【问题描述】:
我有一个应用程序,其中输入已从 50K 位置记录扩大到 110 万条位置记录。 这导致了严重的问题,因为整个文件以前被反序列化为单个对象。 对于具有 110 万条记录的生产类文件,对象的大小约为 1GB。 由于大对象 GC 问题,我希望将反序列化的对象保持在 85K 标记以下。
我试图一次解析出一个位置对象并反序列化它,以便我可以控制对象的数量 得到反序列化,进而控制对象的大小。我正在使用 Json.Net 库来执行此操作。
以下是我作为流接收到我的应用程序的 JSON 文件示例。
{
"Locations": [{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
},
{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
}]
}
我需要能够提取单个 Location 对象,以便查看以下内容
{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
}
我正在尝试使用 Json.NET JsonTextReader 来完成此操作,但是我无法让阅读器在其缓冲区中包含整个位置,因为流中记录的大小最初阅读器将下降为就在对象中间的“RadioProtocols”而言,当流到达对象的末尾时,阅读器已经丢弃了对象的开头。
我用来尝试使此功能正常工作的代码是
var ser = new JsonSerializer();
using (var reader = new JsonTextReader(new StreamReader(stream)))
{
reader.SupportMultipleContent = true;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
{
do
{
reader.Read();
} while (reader.TokenType != JsonToken.EndObject && reader.Depth == 2);
var singleLocation = ser.Deserialize<Locations>(reader);
}
}
}
任何有关此或替代方法的信息将不胜感激。附带说明一下,我们的客户发送信息的方式目前无法改变。
【问题讨论】:
-
听起来您将不得不推出自己的序列化程序,因为 json.NET 将要反序列化的最小合理 json 单元将导致您收到
OutOfMemoryException。话虽如此,我认为这是完全错误的方法。我会解决更大的问题,这显然是您笨拙的数据源或硬件不足。 -
遗憾的是,我们目前无法改变方法,我们基本上被告知只打补丁,或者更准确地说,“只要让它工作而不做太多改变”
-
我尝试运行你的代码,但我发现了一个问题。假设
Locations类型对应于Locations数组中的一个条目,代码将引发异常,因为读取器错误地定位在"LocationId"属性上。是枚举Locations数组中的每个条目,单独加载每个条目的想法吗?
标签: json parsing stream json.net large-object-heap