Solr 可以加载原始 Lucene 索引吗？答案

【问题标题】：Can a raw Lucene index be loaded by Solr?Solr 可以加载原始 Lucene 索引吗？
【发布时间】：2011-02-12 13:04:30
【问题描述】：

我的一些同事有一个大型 Java Web 应用程序，它使用使用 Lucene Java 构建的搜索系统。我想做的是有一个很好的基于 HTTP 的 API 来访问那些现有的搜索索引。我以前使用过 Nutch，非常喜欢 OpenSearch 实现以 RSS 形式获取结果的简单性。

我尝试在 solrconfig.xml 中设置 Solr 的 dataDir，希望它能愉快地拾取现有的索引文件，但它似乎只是忽略了它们。

我的主要问题是：

Solr 可以用来访问在其他地方创建的 Lucene 索引吗？或者可能有更好的解决方案？

【问题讨论】：

可能重复：stackoverflow.com/questions/2195404/…
感谢您的提醒。不幸的是，还没有人对这种方法表示赞同或反对......
一个后续问题，是否可以将使用非默认编解码器的 Lucene 索引加载到 Solr 中，例如 SimpleTextCodec？

标签： api search lucene solr

【解决方案1】：

I am trying the same steps with HDF as the home directory and locktype as HDFS but no luck. I see the below error

labs_shard1_replica1: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Index dir 'hdfs://127.0.0.1/user/solr/labs/core_node1/data/index/' of core 'labs_shard1_replica1' is already locked. The most likely cause is another Solr server (or another solr core in this server) also configured to use this directory; other possible causes may be specific to lockType: hdfs

太阳能目录配置

<directoryFactory name="DirectoryFactory"

class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">

但不适用于 HDFS，如下所示

<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
                <str name="solr.hdfs.home">hdfs://127.0.0.1/user/solr</str>
                <bool name="solr.hdfs.blockcache.enabled">true</bool>
                <int name="solr.hdfs.blockcache.slab.count">1</int>
                <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool>
                <int name="solr.hdfs.blockcache.blocksperbank">16384</int>
                <bool name="solr.hdfs.blockcache.read.enabled">true</bool>
                <bool name="solr.hdfs.blockcache.write.enabled">false</bool>
                <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
                <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
                <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
            </directoryFactory>

锁类型 高清晰度电视

【讨论】：

如果您有这些答案无法解决的问题，请ask a new question

【解决方案2】：

到底三步：

更改 schema.xml 或 (managed-schema)
在 solrconfig.xml 中更改 dataDir>
重启 Solr

我有我的学习笔记here 给像我这样的 Solr 新手 :)
自己生成一些lucene索引，可以使用我的代码here。

public class LuceneIndex {
    private static Directory directory;

    public static void main(String[] args) throws IOException {
        long startTime = System.currentTimeMillis();

        // open
        Path path = Paths.get("/tmp/myindex/index");
        directory = new SimpleFSDirectory(path);
        IndexWriter writer = getWriter();

        // index
        int documentCount = 10000000;
        List<String> fieldNames = Arrays.asList("id", "manu");

        FieldType myFieldType = new FieldType();
        myFieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
        myFieldType.setOmitNorms(true);
        myFieldType.setStored(true);
        myFieldType.setTokenized(true);
        myFieldType.freeze();

        for (int i = 0; i < documentCount; i++) {
            Document doc = new Document();
            for (int j = 0; j < fieldNames.size(); j++) {
                doc.add(new Field(fieldNames.get(j), fieldNames.get(j) + Integer.toString(i), myFieldType));
            }
            writer.addDocument(doc);
        }
        // close
        writer.close();
        System.out.println("Finished Indexing");
        long estimatedTime = System.currentTimeMillis() - startTime;
        System.out.println(estimatedTime);
    }
    private static IndexWriter getWriter() throws IOException {
        return new IndexWriter(directory, new IndexWriterConfig(new WhitespaceAnalyzer()));
    }
}

【讨论】：

【解决方案3】：

成功！有了 Pascal 对 schema.xml 更改的建议，我很快就让它工作了。谢谢！

以下是我为感兴趣的人提供的完整步骤：

下载 Solr 并将 dist/apache-solr-1.4.0.war 复制到 tomcat/webapps
将 example/solr/conf 复制到 /usr/local/solr/
将预先存在的 Lucene 索引文件复制到 /usr/local/solr/data/index
将 solr.home 设置为 /usr/local/solr
在 solrconfig.xml 中，将 dataDir 更改为 /usr/local/solr/data（Solr 在里面寻找索引目录）
将我的 Lucene 索引加载到 Luke 中进行浏览（很棒的工具）
在示例 schema.xml 中，删除了除“string”之外的所有字段和字段类型
在示例 schema.xml 中，添加了对应于 Luke 中显示的 14 个字段的 14 个字段定义。示例：<field name="docId" type="string" indexed="true" stored="true"/>
在示例 schema.xml 中，将 uniqueKey 更改为我的索引中似乎是文档 ID 的字段
在示例 schema.xml 中，将 defaultSearchField 更改为我的索引中似乎包含术语的字段
启动tomcat，终于没有看到异常，在localhost:8080/solr/admin中成功跑了一些查询

这只是证明它可以工作。显然还有很多配置要做。

【讨论】：

这对我使用 Solr 5.2.0 非常有效，另外我必须在 solrconfig.xml 中指定我没有使用托管模式：<schemaFactory class="ClassicIndexSchemaFactory"/> 对于 5 岁的孩子来说还不错回答！

【解决方案4】：

我从未尝试过，但您必须调整 schema.xml 以包含 Lucene 索引中的文档的所有字段，因为如果不是，Solr 将不允许您搜索字段在 schema.xml 中定义。

对 schema.xml 的调整还应包括定义查询时间分析器以在您的字段中正确搜索，尤其是在使用自定义分析器索引的字段时。

在 solrconfig.xml 中，您可能需要更改 indexDefaults 和 mainIndex 部分中的设置。

但我很乐意阅读真正做到这一点的人的回答。

【讨论】：

我正在使用 Luke 查看索引，它并不是非常复杂。有 14 个字段，均以字符串形式键入。我会给你建议的配置尝试并报告。谢谢！
卢克是你的朋友 :)