【发布时间】:2017-09-14 12:40:49
【问题描述】:
我在我的 elasticsearch 中索引了文档。示例文档如下所示:
{
"_index": "processed_tweets",
"_type": "processed",
"_id": "830403820580663296",
"_score": 1,
"_source": {
"at": [
"@LouisDasch"
],
"original_tweet_id": "830398288352403457",
"id_str": "830403820580663296",
"trigrams": [
"blessed lourdes lady",
"lourdes lady feast",
"lady feast day",
"feast day wishing"
],
"hashtags": [
"#Catholic"
],
"id_tweet_creator": "487735029",
"tokens": [
"blessed",
"lourdes",
"lady",
"feast",
"day",
"wishing"
],
"bigrams": [
"blessed lourdes",
"lourdes lady",
"lady feast",
"feast day",
"day wishing"
],
"retweeted": true
}
}
我想将我已索引的所有文档的“主题标签”字段中的所有主题标签小写。 例如我会有: “标签”:[“#Catholic”]->“标签”:[“#catholic”] 将每个关键字更新为等效的小写字母(保留“#”)的最佳方法是什么(耗时更少)?
【问题讨论】:
-
它们都遵循相同的结构吗?
-
@depperm 实际上我的解决方案是完全重新索引,但我想知道是否有替代方案
-
@DmitryPolonskiy 某些文档可能缺少 original_tweet_id
标签: python elasticsearch lucene