我怎样才能接受人类口音（Wavenet 或 Ssml 声音）？答案

【问题标题】：How I can take to human accent (Wavenet or Ssml voices)?我怎样才能接受人类口音（Wavenet 或 Ssml 声音）？
【发布时间】：2020-04-23 08:58:04
【问题描述】：

我正在使用这个谷歌云文本到语音，就像他们在他们的网站上写的一样。 https://codelabs.developers.google.com/codelabs/cloud-text-speech-csharp/#6)

但是没有关于如何输出 Wavenet 语音 (Ssml) 的详细信息。这个编码输出是正常的声音。

我的问题是，使用此代码，我如何才能接受人类口音（Wavenet 或 Ssml 语音）？

using Google.Cloud.TextToSpeech.V1;
using System;
using System.IO;

namespace TextToSpeechApiDemo
{
    class Program
    {
        static void Main(string[] args)
        {
            var client = TextToSpeechClient.Create();

            // The input to be synthesized, can be provided as text or SSML.
            var input = new SynthesisInput
            {
                **Text = "This is a demonstration of the Google Cloud Text-to-Speech API"
            };
            // Build the voice request.
            var voiceSelection = new VoiceSelectionParams
            {
                LanguageCode = "en-US",
                SsmlGender = SsmlVoiceGender.Female**
            };

            // Specify the type of audio file.
            var audioConfig = new AudioConfig
            {
                AudioEncoding = AudioEncoding.Mp3
            };

            // Perform the text-to-speech request.
            var response = client.SynthesizeSpeech(input, voiceSelection, audioConfig);

            // Write the response to the output file.
            using (var output = File.Create("output.mp3"))
            {
                response.AudioContent.WriteTo(output);
            }
            Console.WriteLine("Audio content written to file \"output.mp3\"");
        }
    }
}

【问题讨论】：

你好。编码输出是“正常声音”是什么意思？因为您定义了SsmlGender = SsmlVoiceGender.Female，所以您正在使用 Ssml 声音。
''NORMAL VOICES'' 意思是像电脑的声音。

标签： api google-cloud-platform text-to-speech

【解决方案1】：

Here 您可以查看文本转语音 API 支持的语言和语音。如tutorial 中所述，语音的特征在于三个参数：language_code、name 和ssml_gender。

您可以使用以下 Python 代码将文本 "Hello my name is John. How are you?" 翻译成带重音符号 en-GB-Standard-A 的英语

 def synthesize_text(text):                                                                                                                                                                       
     """Synthesizes speech from the input string of text."""                                                                                                                                      
     from google.cloud import texttospeech                                                                                                                                                        
     client = texttospeech.TextToSpeechClient()                                                                                                                                                   

     input_text = texttospeech.types.SynthesisInput(text=text)                                                                                                                                    

     # Note: the voice can also be specified by name.                                                                                                                                             
     # Names of voices can be retrieved with client.list_voices().                                                                                                                                
     voice = texttospeech.types.VoiceSelectionParams(                                                                                                                                             
         language_code='en-GB',                                                                                                                                                                   
         name='en-GB-Standard-A',                                                                                                                                                                 
         ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)                                                                                                                                   

     audio_config = texttospeech.types.AudioConfig(                                                                                                                                               
         audio_encoding=texttospeech.enums.AudioEncoding.MP3)                                                                                                                                     

     response = client.synthesize_speech(input_text, voice, audio_config)                                                                                                                         

     # The response's audio_content is binary.                                                                                                                                                    
     with open('output.mp3', 'wb') as out:                                                                                                                                                        
         out.write(response.audio_content)                                                                                                                                                        
         print('Audio content written to file "output.mp3"')                                                                                                                                      


 text="Hello my name is John. How are you?"                                                                                         
 synthesize_text(text)

我不熟悉 C# 语言，但根据 C# 和 java 文档判断，您应该能够定义名称参数以及调整语音。

【讨论】：