Google Speech-To-Text 会忽略自定义短语/单词答案

【问题标题】：Custom phrases/words are ignored by Google Speech-To-TextGoogle Speech-To-Text 会忽略自定义短语/单词
【发布时间】：2021-11-20 18:59:06
【问题描述】：

我正在使用 python3 通过提供的 python 包 (google-speech) 使用谷歌语音到文本转录音频文件。

有一个选项可以定义用于转录的自定义短语，如文档中所述：https://cloud.google.com/speech-to-text/docs/speech-adaptation

出于测试目的，我使用了一个包含文本的小音频文件：

[..] 在本次讲座中，我们将讨论 Burrows Wheeler 变换和 FM 索引 [..]

如果我希望使用正确的符号来识别特定名称，我会给出以下短语来查看效果。在此示例中，我想将 burrows 更改为 barrows：

config = speech.RecognitionConfig(dict(
    encoding=speech.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED,
    sample_rate_hertz=24000,
    language_code="en-US",
    enable_word_time_offsets=True,
    speech_contexts=[
        speech.SpeechContext(dict(
            phrases=["barrows", "barrows wheeler", "barrows wheeler transform"]
        ))
    ]
))

不幸的是，这似乎没有任何效果，因为输出仍然与没有上下文短语的输出相同。

我是否使用了错误的短语，或者它对它所听到的词确实是 burrows 的信心如此之高，以至于它会忽略我的短语？

PS：我也尝试使用speech_v1p1beta1.AdaptationClient 和speech_v1p1beta1.SpeechAdaptation，而不是将这些短语放入配置中，但这只会给我一个内部服务器错误，而没有关于问题所在的其他信息。 https://cloud.google.com/speech-to-text/docs/adaptation

【问题讨论】：

不确定dict 函数是否会导致speech_contexts 对象可能未启用。无论如何，我想看看这种行为。你能分享一下带有测试短语的小音频文件吗？

标签： python speech-to-text google-speech-api google-speech-to-text-api

【解决方案1】：

我创建了一个音频文件来重新创建您的场景，并且我能够使用model adaptation 提高识别率。要使用此功能实现此目的，我建议您查看此example 和此post 以更好地了解适应模型。

现在，为了提高您的短语的识别率，我执行了以下操作：

我使用以下 page 和提到的短语创建了一个新的音频文件。

在本次讲座中，我们将讨论 Burrows Wheeler 变换和 FM 索引

我的测试基于此code sample。这段代码创建了一个PhraseSet 和CustomClass，其中包含您想要改进的单词，在本例中是单词“barrows”。您还可以使用Speech-To-Text GUI 创建/更新/删除短语集和自定义类。以下是我用于改进的代码。

from os import pathconf_names
from google.cloud import speech_v1p1beta1 as speech
import argparse


def transcribe_with_model_adaptation(
    project_id="[PROJECT-ID]", location="global", speech_file=None, custom_class_id="[CUSTOM-CLASS-ID]", phrase_set_id="[PHRASE-SET-ID]"
):
    """
    Create`PhraseSet` and `CustomClasses` to create custom lists of similar
    items that are likely to occur in your input data.
    """
    import io

    # Create the adaptation client
    adaptation_client = speech.AdaptationClient()

    # The parent resource where the custom class and phrase set will be created.
    parent = f"projects/{project_id}/locations/{location}"

    # Create the custom class resource
    adaptation_client.create_custom_class(
        {
            "parent": parent,
            "custom_class_id": custom_class_id,
            "custom_class": {
                "items": [
                    {"value": "barrows"}
                ]
            },
        }
    )
    custom_class_name = (
        f"projects/{project_id}/locations/{location}/customClasses/{custom_class_id}"
    )
    # Create the phrase set resource
    phrase_set_response = adaptation_client.create_phrase_set(
        {
            "parent": parent,
            "phrase_set_id": phrase_set_id,
            "phrase_set": {
                "boost": 0,
                "phrases": [
                    {"value": f"${{{custom_class_name}}}", "boost": 10},
                    {"value": f"talk about the ${{{custom_class_name}}} wheeler transform", "boost": 15}
                ],
            },
        }
    )
    phrase_set_name = phrase_set_response.name
    # print(u"Phrase set name: {}".format(phrase_set_name))
 
    # The next section shows how to use the newly created custom
    # class and phrase set to send a transcription request with speech adaptation

    # Speech adaptation configuration
    speech_adaptation = speech.SpeechAdaptation(
        phrase_set_references=[phrase_set_name])

    # speech configuration object
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=24000,
        language_code="en-US",
        adaptation=speech_adaptation,
        enable_word_time_offsets=True,
        model="phone_call",
        use_enhanced=True
    )

    # The name of the audio file to transcribe
    # storage_uri URI for audio file in Cloud Storage, e.g. gs://[BUCKET]/[FILE]
    with io.open(speech_file, "rb") as audio_file:
        content = audio_file.read()

    audio = speech.RecognitionAudio(content=content)
    # audio = speech.RecognitionAudio(uri="gs://biasing-resources-test-audio/call_me_fionity_and_ionity.wav")

    # Create the speech client
    speech_client = speech.SpeechClient()

    response = speech_client.recognize(config=config, audio=audio)

    for result in response.results:
        # The first alternative is the most likely one for this portion.
        print(u"Transcript: {}".format(result.alternatives[0].transcript))

    # [END speech_transcribe_with_model_adaptation]


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
    )
    parser.add_argument("path", help="Path for audio file to be recognized")
    args = parser.parse_args()

    transcribe_with_model_adaptation(speech_file=args.path)

一旦运行，您将收到如下改进的识别；但是，考虑到代码在运行时会尝试创建一个新的自定义类和一个新的短语集，如果尝试重新创建自定义类和短语集，它可能会抛出带有 element already exists 消息的错误。李>

使用没有适配的识别

(python_speech2text) user@penguin:~/replication/python_speech2text$ python speech_model_adaptation_beta.py audio.flac
Transcript: in this lecture will talk about the Burrows wheeler transform and the FM index

将识别与适应结合使用

(python_speech2text) user@penguin:~/replication/python_speech2text$ python speech_model_adaptation_beta.py audio.flac
Transcript: in this lecture will talk about the barrows wheeler transform and the FM index

最后，我想添加一些关于改进和我执行的代码的注释：

我使用了flac 音频文件，因为它是recommended 以获得最佳效果。
我使用了 model="phone_call" 和 use_enhanced=True，因为这是 Cloud Speech-To-Text 使用我自己的音频文件识别的模型。此外，增强模型可以提供更好的结果，您可以查看documentation 了解更多详细信息。请注意，此配置可能与您的音频文件不同。
考虑向 Google 启用 data logging 以从您的音频转录请求中收集数据。然后，Google 会使用这些数据来改进其用于识别语音音频的机器学习模型。
创建自定义类和短语集后，您可以使用 Speech-to-Text UI 快速更新和执行测试。只包含
我在短语集中使用了参数 boost，当您使用 boost 时，您为 PhraseSet 资源中的短语项目分配了一个加权值。在为音频数据中的单词选择可能的转录时，Speech-to-Text 指的是这个加权值。该值越高，Speech-to-Text 从可能的备选方案中选择该词或短语的可能性就越高。

希望这些信息能帮助您提高知名度。

【讨论】：

非常感谢，我能够用我的音频文件重现相同的结果。首先创建自定义类然后在短语中链接它是否重要，还是只是最佳实践？
我很高兴知道这个例子对你有帮助。关于您的问题，答案是“是”。首先创建自定义类和阶段集很重要，因为它们是提高识别率的提示。您可以查看此document 了解更多详情