如何在 Azure 认知搜索中使用拆分技能？答案

【问题标题】：How to use Split Skill in azure cognitive search?如何在 Azure 认知搜索中使用拆分技能？
【发布时间】：2020-07-12 00:48:36
【问题描述】：

我是 Azure 认知搜索的新手。我有一个 docx 文件，它存储在 azure blob 存储中。我正在使用 #Microsoft.Skills.Text.SplitSkill 将文档拆分为多个页面（块）。但是当我索引这个输出时技能，我正在获取整个 docx 文件内容。如何从 SplitSkill 返回“页面”，以便用户看到通过搜索找到的原始文档部分，而不是返回整个文档？

请帮助我。提前谢谢你。

【问题讨论】：

标签： azure-cognitive-search azure-blob-storage

【解决方案1】：

拆分技能允许您将文本拆分成更小的块/页面，然后可以通过其他认知技能进行处理。

以下是进行拆分和翻译的简约技能组合的样子：

"skillset": [
    {
        "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
        "textSplitMode": "pages",
        "maximumPageLength": 1000,
        "defaultLanguageCode": "en",
        "inputs": [
            {
                "name": "text",
                "source": "/document/content"
            },
            {
                "name": "languageCode",
                "source": "/document/language"
            }
        ],
        "outputs": [
            {
                "name": "textItems",
                "targetName": "mypages"
            }
        ]
    },
    {
        "@odata.type": "#Microsoft.Skills.Text.TranslationSkill",
        "name": "#2",
        "description": null,
        "context": "/document/mypages/*",
        "defaultFromLanguageCode": null,
        "defaultToLanguageCode": "es",
        "suggestedFrom": "en",
        "inputs": [
            {
                "name": "text",
                "source": "/document/mypages/*"
            }
        ],
        "outputs": [
            {
                "name": "translatedText",
                "targetName": "translated_text"
            }
        ]
    }
]

请注意，拆分技能在丰富树的“\document\mypages”节点下生成了一组文本元素。也不是通过向翻译技能提供上下文“\document\mypages*”，我们是在告诉翻译技能在“每一页”上执行翻译。

我应该指出，文档仍然会在文档级别被索引。技能组并不是真正为“改变索引的基数”而构建的。也就是说，一种解决方法可能是将每个页面作为单独的元素投影到知识库中，然后创建一个单独的索引，该索引实际上专注于为每个页面编制索引。

在此处了解有关知识库预测的更多信息： https://docs.microsoft.com/en-us/azure/search/knowledge-store-concept-intro

【讨论】：