【发布时间】:2015-01-28 21:57:30
【问题描述】:
我在我的项目中使用 SolrJ + Solr。 问题是我在 Solr/Jetty 方面遇到了不清楚的瓶颈
使用 jvisualvm 我连接到启动 Solr 的 JVM 实例,发现 77% 的时间花费在方法“org.eclipse.jetty.io.ByteArrayBuffer.readFrom()”中,其中一个线程的堆栈跟踪如下:
"qtp64700533-36718" - Thread t@36718
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1040)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
所以,花在 I/O 上的时间看起来还可以,但是:
- 在本地机器上启动查询的应用程序(所以 I/O 时间不应该很大,上面堆栈跟踪中的线程状态“RUNNABLE”似乎很可疑)
- 查询响应时间可能长达 5-10 秒
- 机器 (CentOS) 上的平均负载约为 10
感谢任何帮助/建议,谢谢!
更新:
确实,伙计们,我忘了提供其他信息。这里是:
硬件:i3770,32gb ram,据 iotop 显示,读取速度为 50-600kb/sec,写入速度为 200-1000kb/sec(几乎与 SOLR 进程有关)
OS :Centos 6.6
java:OpenJDK 64 位服务器 VM (1.7.0_71 24.65-b04)
solr:4.9.0(已发布使用 -Xmx=24000,但我认为应该拆分 SOLR 内核以分离 JVM SOLR 实例以最小化 GC 时间)
solrj:4.10.3,添加/更新/删除使用 commitWithIn=10000 完成的文档java代码中的毫秒。
关于架构:我在 SOLR 数据(广告 + 对象)中存储了 5 个国家/地区:UA、RU、PL、BY、KZ。 因此,每个国家/地区有 2 个核心,例如乌克兰:ua_ads 和 ua_objects(总共 10 个核心) 国家之间的架构几乎相同,请参见下面的乌克兰
“ua_ads”模式(应该从“example”重命名它:))
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
<uniqueKey>adId</uniqueKey>
<field name="adId" type="long" indexed="true" stored="true" required="true"/>
<field name="objectId" type="long" indexed="true" stored="true" required="false"/>
<field name="url" type="string" indexed="false" stored="true" required="true"/>
<field name="regionId" type="int" indexed="false" stored="true" required="true"/>
<field name="sourceId" type="int" indexed="false" stored="true" required="true"/>
<field name="type" type="int" indexed="false" stored="true" required="true"/>
<field name="title" type="text_ru" indexed="false" stored="true" required="true"/>
<field name="address" type="text_ru" indexed="false" stored="true" required="true"/>
<field name="text" type="text_ru" indexed="false" stored="true" required="true"/>
<field name="dateFound" type="tdate" indexed="true" stored="true" required="true"/>
<!-- should be a string field (not int) to avoid cutting zero at beginning of phone number -->
<field name="phoneNumbers" type="string" indexed="true" stored="true" required="true" multiValued="true"/>
<field name="priceLocal" type="long" indexed="false" stored="true" required="false"/>
<field name="priceUsd" type="long" indexed="false" stored="true" required="false"/>
<field name="currency" type="int" indexed="false" stored="true" required="false"/>
<field name="roomsCount" type="int" indexed="false" stored="true" required="false"/>
<field name="area" type="int" indexed="false" stored="true" required="false"/>
<field name="imagesCount" type="int" indexed="true" stored="true" required="true"/>
</schema>
“ua_objects”架构
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldtype name="binary" class="solr.BinaryField"/>
<fieldType name="addr_ru" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<!-- no stemming for address, dots must me followed by space: "г. Киев" -->
<!-- char filters is always firs (preprocessing) -->
<charFilter class="solr.MappingCharFilterFactory" mapping="lang/chars_replacement.txt" />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- replacing all except letters, removing "-" in home address (9-А) -->
<filter class="solr.PatternReplaceFilterFactory" pattern="[^0-9abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюяіїє\-]" replacement="" replace="all"/>
<!-- replacing all except letters, removing "-" in home address ("9-а" => "9а") -->
<filter class="solr.PatternReplaceFilterFactory" pattern="(\d{1,3})[\- ]([абвгдеёжзийклмнопрстуфхцчшщ])" replacement="$1$2" replace="all"/>
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="lang/cities_ukr2rus.txt"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ї" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="і" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="й" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ё" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="є" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="э" replacement="е" replace="all"/>
<!-- 1-length is for case with home letters: "Хрещатик, 3" -->
<filter class="solr.LengthFilterFactory" min="1" max="64"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt,lang/stopwords_addr.txt" format="snowball"/>
</analyzer>
</fieldType>
<fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<!-- dots must me followed by space: "г. Киев" -->
<!-- char filters is always firs (preprocessing) -->
<charFilter class="solr.MappingCharFilterFactory" mapping="lang/chars_replacement.txt" />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="[^0-9abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюяіїє\-]" replacement="" replace="all"/>
<!-- replacing all except letters, removing "-" in home address ("9-а" => "9а") -->
<filter class="solr.PatternReplaceFilterFactory" pattern="(\d{1,3})[\- ]([абвгдеёжзийклмнопрстуфхцчшщ])" replacement="$1$2" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ї" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="і" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="й" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ё" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="є" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="э" replacement="е" replace="all"/>
<filter class="solr.LengthFilterFactory" min="1" max="64"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball"/>
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="lang/synonyms.txt"/>
<filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
</analyzer>
</fieldType>
<field name="_version_" type="long" indexed="true" stored="true"/>
<uniqueKey>objectId</uniqueKey>
<field name="objectId" type="long" indexed="true" stored="true" required="true"/>
<field name="url" type="string" indexed="false" stored="true" required="true"/>
<field name="regionId" type="int" indexed="true" stored="true" required="true"/>
<field name="sourceId" type="int" indexed="false" stored="true" required="true"/>
<field name="type" type="int" indexed="true" stored="true" required="true"/>
<field name="address" type="addr_ru" indexed="true" stored="true" required="true"/>
<field name="title" type="text_ru" indexed="true" stored="true" required="true"/>
<field name="text" type="text_ru" indexed="true" stored="true" required="true"/>
<field name="dateFound" type="tdate" indexed="true" stored="true" required="true"/>
<!-- should be a string field (not int) to avoid cutting zero at beginning of phone number -->
<field name="phoneNumbers" type="string" indexed="true" stored="true" required="true" multiValued="true"/>
<field name="ownerDetected" type="boolean" indexed="true" stored="true" required="true"/>
<field name="priceUsd" type="long" indexed="true" stored="true" required="false"/>
<field name="priceLocal" type="long" indexed="false" stored="true" required="false"/>
<field name="currency" type="int" indexed="false" stored="true" required="false"/>
<field name="roomsCount" type="int" indexed="true" stored="true" required="false"/>
<field name="area" type="int" indexed="true" stored="true" required="false"/>
<field name="dateUpdated" type="tdate" indexed="true" stored="true" required="true"/>
<field name="dateClosed" type="tdate" indexed="true" stored="true" required="false"/>
<field name="m2priceRel" type="float" indexed="true" stored="true" required="false"/>
<field name="ceddData" type="binary" indexed="false" stored="true" required="false" multiValued="true"/>
<field name="imagesCount" type="int" indexed="true" stored="true" required="true"/>
<field name="uniqAdTexts" type="string" indexed="false" stored="true" required="true" multiValued="true"/>
</schema>
最大的索引:
ru_ads:2.99gb
ru_objects:3.25gb
ua_ads:5.45GB
ua_objects:2.36gb
其他核心指标比较小
运行时间过长的查询(来自客户端的“太长”)看起来像这样(取自 SOLR 日志,“????”只是非英文字母)
400723188 [qtp64700533-40547] INFO org.apache.solr.core.SolrCore ? [ua-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(??????\+????????\+???????\+????????)+AND+type:3+AND+regionId:2+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[2+TO+2])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[40+TO+60])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[23500+TO+70500])+AND+dateUpdated:[2014-12-09T10:23:07Z+TO+2015-01-28T10:23:07Z]+AND+-objectId:(27824841)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=18 status=0 QTime=287
401989528 [qtp64700533-40830] INFO org.apache.solr.core.SolrCore ? [ru-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(?????????????\+??????)+AND+type:4+AND+regionId:162+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[40+TO+58])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[9+TO+27])+AND+dateUpdated:[2014-12-09T10:44:08Z+TO+2015-01-28T10:44:08Z]+AND+-objectId:(26415616)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=820 status=0 QTime=5755
400832723 [qtp64700533-40322] INFO org.apache.solr.core.SolrCore ? [ru-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(????????\+???????)+AND+type:4+AND+regionId:102+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[31+TO+45])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[115+TO+343])+AND+dateUpdated:[2014-12-09T10:24:57Z+TO+2015-01-28T10:24:57Z]+AND+-objectId:(26415342)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=9 status=0 QTime=372
402069370 [qtp64700533-40832] INFO org.apache.solr.core.SolrCore ? [ru-objects] webapp=/solr path=/select params={mm=1&fl=*&start=0&q=(????????\+?????????\+??\+????????)+AND+type:3+AND+regionId:135+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[28+TO+40])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[9529+TO+28585])+AND+dateUpdated:[2014-10-30T10:45:33Z+TO+2015-01-28T10:45:33Z]+AND+-objectId:(26415855)&qf=address^20+title^2+text&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=14075 status=0 QTime=544
401805198 [qtp64700533-40233] INFO org.apache.solr.core.SolrCore ? [ua-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(??????\+??\+??????\+?????\+??????????)+AND+type:3+AND+regionId:16+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[3+TO+3])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[93+TO+95])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[284050+TO+313950])+AND+dateUpdated:[2015-01-08T10:41:09Z+TO+2015-01-28T10:41:09Z]+AND+-objectId:(27826334)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=6 status=0 QTime=462
这是来自 jvisualvm 的全新分析截图
“top”命令的一部分,延迟=10sec
【问题讨论】:
-
请分享有关您的安装、如何运行 Solr 等的更多信息。这可能有很多问题。你的索引有多大?
-
此外,您的架构如何以及您运行哪些查询?
-
Mysterion,cheffe - 我添加了其他信息
-
我是否正确地认为您每次都尝试获取完整的命中集?我可以看到参数
rows=2147483647。那么难怪查询可能需要一些时间。 Solr 会在您的第二个查询中呈现 820 个结果的输出 -
cheffe,没错,我设置 rows=214748364(最大整数)来获取匹配查询的所有可能结果。这绝对不是最佳实践,但是,我希望在“CPU”方法中的某处看到“热”方法,而不是在看起来像“I/O”方法的“ByteArrayBuffer.readFrom()”中(添加了“top”命令输出)