【问题标题】:Speaker Diarization when using Python Speech Recognition使用 Python 语音识别时的说话人分类
【发布时间】:2019-11-26 14:15:52
【问题描述】:

在 Python 中使用 import speech_recognition 时是否可以选择对输出进行分类?

我会很感激这方面的建议,或者是否有可能。

此外,我们将不胜感激任何关于在文本文件中输出此信息的建议,并在每个新扬声器之间添加行。

import speech_recognition as sr

from os import path

from pprint import pprint

audio_file = path.join(path.dirname(path.realpath(__file__)), "RobertP.wav")

r = sr.Recognizer()
with sr.AudioFile(audio_file) as source:
    audio = r.record(source)

try:
    txt = r.recognize_google(audio, show_all=True)
except:
    print("Didn't work.")

text = str(txt)

f = open("tester.txt", "w+")
f.write(text)
f.close()

注意:为我的新手道歉。

【问题讨论】:

    标签: python speech-recognition google-speech-api


    【解决方案1】:

    演讲者分类目前在 Google Speech-to-Text API 中处于测试阶段。您可以找到此功能的文档here。可以通过多种方式对输出进行处理。下面是一个例子(基于thisMedium文章):

    import io
    
    def transcribe_file_with_diarization(speech_file):
        “””Transcribe the given audio file synchronously with diarization.”””
    
        from google.cloud import speech_v1p1beta1 as speech
        client = speech.SpeechClient()
    
        with io.open(speech_file, ‘rb’) as audio_file:
            content = audio_file.read()
        audio = {"content": content}
    
        encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16
        sample_rate_hertz=48000
        language_code=’en-US’
        enable_speaker_diarization=True
        enable_automatic_punctuation=True
        diarization_speaker_count=4
    
        config = {
            "encoding": encoding,
            "sample_rate_hertz": sample_rate_hertz,
            "language_code": language_code,
            "enable_speaker_diarization": enable_speaker_diarization,
            "enable_automatic_punctuation": enable_automatic_punctuation,
            # Optional:
            "diarization_speaker_count": diarization_speaker_count
        }
    
        print(‘Waiting for operation to complete…’)
        response = client.recognize(config, audio)
    
        # The transcript within each result is separate and sequential per result.
        # However, the words list within an alternative includes all the words
        # from all the results thus far. Thus, to get all the words with speaker
        # tags, you only have to take the words list from the last result:
    
        result = response.results[-1]
        words_info = result.alternatives[0].words
    
        speaker1_transcript=””
        speaker2_transcript=””
        speaker3_transcript=””
        speaker4_transcript=””
    
        # Printing out the output:
        for word_info in words_info:
            if(word_info.speaker_tag==1): 
                speaker1_transcript=speaker1_transcript+word_info.word+’ ‘
            if(word_info.speaker_tag==2): 
                speaker2_transcript=speaker2_transcript+word_info.word+’ ‘
            if(word_info.speaker_tag==3): 
                speaker3_transcript=speaker3_transcript+word_info.word+’ ‘
            if(word_info.speaker_tag==4): 
                speaker4_transcript=speaker4_transcript+word_info.word+’ ‘
    
        print(“speaker1: ‘{}’”.format(speaker1_transcript))
        print(“speaker2: ‘{}’”.format(speaker2_transcript))
        print(“speaker3: ‘{}’”.format(speaker3_transcript))
        print(“speaker4: ‘{}’”.format(speaker4_transcript))
    

    【讨论】:

      猜你喜欢
      • 2013-03-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-04-22
      • 2018-10-02
      • 1970-01-01
      • 2014-01-29
      相关资源
      最近更新 更多