【发布时间】:2020-10-08 10:00:24
【问题描述】:
我们创建了一个 rest API,它在 Janus 图上执行 gremlin 查询并以 JSON 格式返回结果。用于小型结果集的 API 工作文件。但是对于大型结果集,当我们异步访问 API 时,会出现以下错误,(max heap size -Xmx4g
java.lang.OutOfMemoryError: 超出 GC 开销限制
我正在使用 curl 和 & 来异步访问 API,
curl --location --request GET 'http://HOST:PORT/graph/search?gremlin=query &
curl --location --request GET 'http://HOST:PORT/graph/search?gremlin=query &
curl --location --request GET 'http://HOST:PORT/graph/search?gremlin=query &
curl --location --request GET 'http://HOST:PORT/graph/search?gremlin=query &
连接janus图的代码,
cluster = Cluster.open(config);
connect = cluster.connect();
submit = connect.submit(gremlin);
Iterator<Result> resultIterator = submit.iterator();
int count=0;
while (resultIterator.hasNext()){
//add to list, commented to check OOM error
}
配置,
config.setProperty("connectionPool.maxContentLength", "50000000");
config.setProperty("connectionPool.maxInProcessPerConnection", "30");
config.setProperty("connectionPool.maxInProcessPerConnection", "30");
config.setProperty("connectionPool.maxSize", "30");
config.setProperty("connectionPool.minSize", "1");
config.setProperty("connectionPool.resultIterationBatchSize", "200");
Gremlin 驱动程序,
org.apache.tinkerpop.gremlin-driver:3.4.6
如何处理像游标这样的大型结果集,以便并非所有数据都加载到内存中?
有没有我遗漏的配置?非常感谢任何帮助。
Gremlin 查询:
g.withSack(0).V().hasLabel(%27material%27).has(%27dim_batchid%27,within(5028245,5080395,5366265,5159380,4872924,5093856,5216023,5068771,5093820,5154387,4703406,4872835,5214752,4893085,4866319,4556751,5342365,5075448,5074467,4835525,4987972,5347712,4986643,5204689,4755232,5076490,5028246,4922387,4659627,4597456,4743346,5080956,5370167,5260125,5134845,4613324,4720631,4937766,5356972,5148510,5210986,4930135,4984021,4720172,5028031,4836893,5068621,5333830,5020806,5081693,4988567,4869467,4709219,4958246,5021639,4607913,4923487,4614485,5066054,4869093,5339365,5204715,4980349,5215913,5342616,4959705,4959549,4929369,5022805,4920163,5204563,5027627,5208788,4712451,4862298,5019103,4982159,4727160,5395618,4924536,5390450,4943986,5071744,5208844,4898192,5347546,5204875,4710474,4794222,4962808,5269053,4836267,4602886,5359126,5393203,4780380,5148475,5092749,5351705,5339311,4601782,4869039,5366475,4959070,4963475,5346888,4923494,5279816,5297980,5154181,5030501,5142954,5392329,4839306,4890656,5134911,4893104,4989444,5069672,4961009,5027559,5029007,5285813,4820025,5287707,4959634,5148474,5362926,5362211,4557278,5353486,4933573,4785560,4890658,4930937,4553089,5030503,5341503,4783801,5068529,4821152,5208845,4766406,5043752,4770709,4733416,5204713,4815450,4981053,4963427,4980830,5340154,4771353,5204561,4920161,4794149,5275867,5021788,5364102,5205411,5356459,4794233,4923438,4610509,5392350,4746342,5022804,4936411,5361555,4890888,4980829,4959869,4869092,4891157,4815449,5267434,4836975,4684010,5281322,5071746,4711290,5289333,5021638,5299283,5210803,5348731,5068491,4776862,5196532,4766677,4930133,5210984,4608878,5261295,4826630,4786051,4779996,4930134,5020804,4766678,4869064,5286802,4545299,4693065,4930844,4816538,4888415,4711706,4923002,4780402,5044968,5148437,4753993,5074466,4890805,5074558,5076491,4547035,5092021,5262308,5205445,5213382,5159381,5263280,5351407,4890706,4659738,5344469,5075928,4613336,5065866,4863764,5217111,4792255,5210914,5204691,4890806,5148438,4986897,4817686,4712337,5196528,5280266,4929327,5134843,5393007,5019151,4923482,4763007,4929395)).emit().repeat(sack(sum).by(constant(1)).inE().outV()).project(%27level%27,%27properties%27).by(sack()).by(tree().by(valueMap().by(fold().unfold())).by(valueMap().by(fold())))
通过分析,很明显是 gremlin 驱动程序导致了问题,但我不确定如何修复它并释放内存。
此外,线程进入冻结状态超过 5 分钟,
【问题讨论】:
-
Gremlin Server 是否生成 OOM 或您的 REST API?此外,您似乎正在提交 Gremlin 脚本 - 您可能需要共享该查询。
-
REST API 在异步访问 API 时导致 OOM。问题在这里
while (resultIterator.hasNext())。该查询返回一个大型数据集,导致它等待所有结果完成。 -
@stephenmallette 我已在问题中添加了查询。
-
该循环中的注释说“//添加到列表,注释以检查 OOM 错误” - 您实际上是在为每个结果构建一个
List对象吗? -
我正在构建但评论了该代码以检查添加到列表是否导致 OOM,但事实并非如此。
标签: java out-of-memory gremlin janusgraph tinkerpop3