【发布时间】:2014-01-18 19:13:08
【问题描述】:
我想查找文本中某个单词的出现次数。 我有这样的课
public class Page
{
public string Id { get; set; }
public string BookId { get; set; }
public string Content { get; set; }
public int PageNumber { get; set; }
}
我的索引是这样的:
class Pages_SearchOccurrence : AbstractIndexCreationTask<Page, Pages_SearchOccurrence.ReduceResult>
{
public class ReduceResult
{
public string PageId { get; set; }
public int Count { get; set; }
public string Word { get; set; }
public string Content { get; set; }
}
public Pages_SearchOccurrence()
{
Map = pages => from page in pages
let words = page.Content
.ToLower()
.Split(new string[] { " ", "\n", ",", ";" }, StringSplitOptions.RemoveEmptyEntries)
from w in words
select new
{
page.Content,
PageId = page.Id,
Count = 1,
Word = w
};
Reduce = results => from result in results
group result by new { PageId = result.PageId, result.Word } into g
select new
{
Content = g.First().Content,
PageId = g.Key.PageId,
Word = g.Key.Word,
Count = g.ToList().Count()
};
Index(x => x.Content, Raven.Abstractions.Indexing.FieldIndexing.Analyzed);
}
}
最后,我的查询是这样的:
using (var session = documentStore.OpenSession())
{
RavenQueryStatistics stats;
var occurence = session.Query<Pages_SearchOccurrence.ReduceResult, Pages_SearchOccurrence>()
.Statistics(out stats)
.Where(x => x.Word == "works")
.ToList();
}
但我意识到 RavenDb 很慢(或者我的查询不好) stats.IsStale = true 并且 raven studio 花费了太多时间并且只给出很少的结果。 我有 1000 个文档“页面”,每页内容为 1000 个单词。 为什么我的查询不正确,我如何在页面中找到这些事件? 感谢您的帮助!
【问题讨论】:
-
你为什么不依赖 Lucene 呢?它确实具有您知道的全文索引和查询功能。我错过了什么吗?
-
您可能会觉得这很有帮助:stackoverflow.com/questions/16774036/…
标签: ravendb