【发布时间】:2020-06-25 06:52:18
【问题描述】:
我有一个 blob 容器,其中每个文件夹代表我在 ACS 中编制索引的项目。文件夹名称是 ACS 索引中项目的键。想象一下下面的容器结构:
container {
item1 {
blob1,
blob2
},
item2 {
blob3
},
item3 {
blob4,
blob5,
blob6
}
}
我希望能够针对容器运行索引器,从具有技能的 blob 中提取见解,例如 OcrSkill、KeyPhrases、EntityRecognition 等。 我知道我可以使用 ShaperSkill 将单个 blob/文档的信息转换为我喜欢的格式。例如:
List<InputFieldMappingEntry> inputMappings = new List<InputFieldMappingEntry>();
inputMappings.Add(new InputFieldMappingEntry(
name: "content",
source: "/document/content"));
inputMappings.Add(new InputFieldMappingEntry(
name: "languageCode",
source: "/document/languageCode"));
inputMappings.Add(new InputFieldMappingEntry(
name: "keyPhrases",
source: "/document/keyPhrases"));
inputMappings.Add(new InputFieldMappingEntry(
name: "organizations",
source: "/document/organizations"));
inputMappings.Add(new InputFieldMappingEntry(
name: "name",
source: "/document/name"));
List<OutputFieldMappingEntry> outputMappings = new List<OutputFieldMappingEntry>();
outputMappings.Add(new OutputFieldMappingEntry(
name: "output",
targetName: "myDoc"));
ShaperSkill shaperSkill = new ShaperSkill(
description: "Shape to myDoc",
context: "/document",
name: "Doc Shaper",
inputs: inputMappings,
outputs: outputMappings);
对于索引器本身,我可以像这样从metadata_storage_path 中提取文件夹名称:
List<FieldMapping> fieldMappings = new List<FieldMapping>();
fieldMappings.Add(new FieldMapping(
sourceFieldName: "metadata_storage_path",
targetFieldName: "key",
mappingFunction: FieldMappingFunction.ExtractTokenAtPosition("/", 4)));
我不知道该怎么做(或者我什至可以做到)是对/document/myDoc 输出字段进行多次引用,并将多个条目放入我的ACS 索引中的集合中。我想要的输出如下:
...(此处仅显示相关字段)
{
"value": [
{
"key": "item1",
"myDocs": [
{
"name": "blob1",
"content": "<content from blob1>",
"languageCode": "<languageCode from blob1>",
"keyPhrases": "<keyPhrases from blob1>",
"organizations": "<organizations from blob1>"
},
{
"name": "blob2",
"content": "<content from blob2>",
"languageCode": "<languageCode from blob2>",
"keyPhrases": "<keyPhrases from blob2>",
"organizations": "<organizations from blob2>"
}
]
},
{
"key": "item2",
"myDocs": [
{
"name": "blob3",
"content": "<content from blob3>",
"languageCode": "<languageCode from blob3>",
"keyPhrases": "<keyPhrases from blob3>",
"organizations": "<organizations from blob3>"
}
]
},
{
"key": "item3",
"myDocs": [
{
"name": "blob4",
"content": "<content from blob4>",
"languageCode": "<languageCode from blob4>",
"keyPhrases": "<keyPhrases from blob4>",
"organizations": "<organizations from blob4>"
},
{
"name": "blob5",
"content": "<content from blob5>",
"languageCode": "<languageCode from blob5>",
"keyPhrases": "<keyPhrases from blob5>",
"organizations": "<organizations from blob5>"
},
{
"name": "blob6",
"content": "<content from blob6>",
"languageCode": "<languageCode from blob6>",
"keyPhrases": "<keyPhrases from blob6>",
"organizations": "<organizations from blob6>"
}
]
}
]
}
有人知道我能做什么吗?
【问题讨论】:
-
不可能将多个 blob 聚合到文档的单个字段中。你能描述一下你的索引用例吗?例如,在索引中添加“item”字段是否合适?然后,您可以按“项目”搜索并获取与其关联的所有 blob,即使每个索引文档都是针对单个 blob。
-
否,在此特定索引中添加新项目字段将不起作用。为了清楚起见,我们正在谈论一个集合字段,它旨在支持多个结构化记录。如果这是不可能的,我最终会为 blob 创建一个单独的索引,并且必须在两个索引之间执行搜索以获得一组合并的结果。我试图不这样做,但如果这是必需的,我可以。
-
blob 表示父记录的附件。我希望能够清楚地将每个 blob 的 blob 索引结果关联到它们所附加的父记录。而这个索引中的key代表父记录的key。