使用 Python 语音识别时的说话人分类答案

【问题标题】：Speaker Diarization when using Python Speech Recognition使用 Python 语音识别时的说话人分类
【发布时间】：2019-11-26 14:15:52
【问题描述】：

在 Python 中使用 import speech_recognition 时是否可以选择对输出进行分类？

我会很感激这方面的建议，或者是否有可能。

此外，我们将不胜感激任何关于在文本文件中输出此信息的建议，并在每个新扬声器之间添加行。

import speech_recognition as sr

from os import path

from pprint import pprint

audio_file = path.join(path.dirname(path.realpath(__file__)), "RobertP.wav")

r = sr.Recognizer()
with sr.AudioFile(audio_file) as source:
    audio = r.record(source)

try:
    txt = r.recognize_google(audio, show_all=True)
except:
    print("Didn't work.")

text = str(txt)

f = open("tester.txt", "w+")
f.write(text)
f.close()

注意：为我的新手道歉。

【问题讨论】：

标签： python speech-recognition google-speech-api

【解决方案1】：

演讲者分类目前在 Google Speech-to-Text API 中处于测试阶段。您可以找到此功能的文档here。可以通过多种方式对输出进行处理。下面是一个例子（基于thisMedium文章）：

import io

def transcribe_file_with_diarization(speech_file):
    “””Transcribe the given audio file synchronously with diarization.”””

    from google.cloud import speech_v1p1beta1 as speech
    client = speech.SpeechClient()

    with io.open(speech_file, ‘rb’) as audio_file:
        content = audio_file.read()
    audio = {"content": content}

    encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16
    sample_rate_hertz=48000
    language_code=’en-US’
    enable_speaker_diarization=True
    enable_automatic_punctuation=True
    diarization_speaker_count=4

    config = {
        "encoding": encoding,
        "sample_rate_hertz": sample_rate_hertz,
        "language_code": language_code,
        "enable_speaker_diarization": enable_speaker_diarization,
        "enable_automatic_punctuation": enable_automatic_punctuation,
        # Optional:
        "diarization_speaker_count": diarization_speaker_count
    }

    print(‘Waiting for operation to complete…’)
    response = client.recognize(config, audio)

    # The transcript within each result is separate and sequential per result.
    # However, the words list within an alternative includes all the words
    # from all the results thus far. Thus, to get all the words with speaker
    # tags, you only have to take the words list from the last result:

    result = response.results[-1]
    words_info = result.alternatives[0].words

    speaker1_transcript=””
    speaker2_transcript=””
    speaker3_transcript=””
    speaker4_transcript=””

    # Printing out the output:
    for word_info in words_info:
        if(word_info.speaker_tag==1): 
            speaker1_transcript=speaker1_transcript+word_info.word+’ ‘
        if(word_info.speaker_tag==2): 
            speaker2_transcript=speaker2_transcript+word_info.word+’ ‘
        if(word_info.speaker_tag==3): 
            speaker3_transcript=speaker3_transcript+word_info.word+’ ‘
        if(word_info.speaker_tag==4): 
            speaker4_transcript=speaker4_transcript+word_info.word+’ ‘

    print(“speaker1: ‘{}’”.format(speaker1_transcript))
    print(“speaker2: ‘{}’”.format(speaker2_transcript))
    print(“speaker3: ‘{}’”.format(speaker3_transcript))
    print(“speaker4: ‘{}’”.format(speaker4_transcript))

【讨论】：