Hibernate 搜索手动索引答案

【问题标题】：Hibernate search manual indexingHibernate 搜索手动索引
【发布时间】：2018-05-24 00:46:53
【问题描述】：

我是 Hibernate Search 的新手。我正在尝试集成 Hibernate Search 来搜索地址。我正在使用 Hibernate Search 5.5.6.Final。我的地址表有超过 1500 万条记录。我使用手动索引为现有地址表创建 lucene 索引。索引已完成，但当我通过 Luke 浏览它们时，它只有不到 70,000 个文档。这看起来对吗？文件号不应该比记录数多很多吗？有没有办法确保索引遍历所有记录？请帮忙...

这是我的实体：

@Entity
@Table (name = "ADDRESSES_LOOKUP")
@AnalyzerDef(name = "customanalyzer",
        tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
        filters = {
                @TokenFilterDef(factory = LowerCaseFilterFactory.class),
                @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {
                        @Parameter(name = "language", value = "English")
                })
        })
@Indexed
public class Address {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    @Column (name = "ADDRESS_ID")
    private String id;

    @Column (name = "BUILDING_NAME")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    @Analyzer(definition = "customanalyzer")
    private String buildingName;

    @Column (name = "FLAT_NUMBER")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    private String flatNumber;

    @Column (name = "FLAT_TYPE")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    private String flatType;

    @Column (name = "LEVEL_NUMBER")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    private String levelNumber;

    @Column (name = "LEVEL_TYPE")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    private String levelType;

    @Column (name = "NUMBER_FIRST")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    private String numberFirst;

    @Column (name = "NUMBER_LAST")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    private String numberLast;

    @Column (name = "STREET_NAME")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    private String streetName;

    @Column (name = "STREET_TYPE_CODE")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    private String streetType;

    @Column (name = "LOCALITY_NAME")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    private String locality;

    @Column (name = "STATE_ABBREVIATION")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    private String state;

    @Column (name = "POSTCODE")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    private String postcode;

    @Column (name = "ADDRESS")
    @Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
    @Analyzer(definition = "customanalyzer")
    private String address;

这是索引的代码

public void initializeHibernateSearch() {
    logger.info("Start initialising hibernate search index.");
    try {
        FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
        fullTextEntityManager
                .createIndexer()
                .typesToIndexInParallel( 3 )
                .batchSizeToLoadObjects( 50 )
                .cacheMode( CacheMode.IGNORE )
                .threadsToLoadObjects( 30 )
                .idFetchSize( 150 )
                .transactionTimeout( 1800 )
                .startAndWait();

    } catch (InterruptedException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    logger.info("HIBERNATE SEARCH INDEX INITIALISED.");
}

【问题讨论】：

代码看起来正确。没有例外？
没有例外。现在我正在尝试独立于休眠搜索运行 lucene 索引，看看情况如何。

标签： java hibernate lucene hibernate-search

【解决方案1】：

一个好的起点是使用 ProgressMonitor（SimpleIndexingProgressMonitor 或您定义的自定义方法）并逐步使用一些可用的方法，例如 addToTotalCount 它应该告诉您它打算有多少个地址指数。还有一个printStatusMessage 方法可以让您了解一些进度。

SimpleIndexingProgressMonitor progressMonitor = new SimpleIndexingProgressMonitor();
fullTextSession
                .createIndexer(Address.class)
                .typesToIndexInParallel(1)
                .batchSizeToLoadObjects(50)
                .cacheMode(CacheMode.IGNORE)
                .threadsToLoadObjects(30)
                .idFetchSize(150)
                .progressMonitor(progressMonitor)
                .startAndWait();

该表中还有其他列吗？我想知道您是否只有 70,000 个实际上在这些索引列中有数据。

【讨论】：