如何在 Python 中使用 Google 的 Text-to-Speech API答案

【问题标题】：How to use Google's Text-to-Speech API in Python如何在 Python 中使用 Google 的 Text-to-Speech API
【发布时间】：2019-07-09 00:27:47
【问题描述】：

我的密钥已准备好发出请求并从 Google 的文本中获取语音。
我尝试了这些命令等等。
这些文档没有提供我发现的 Python 入门的直接解决方案。我不知道我的 API 密钥与 JSON 和 URL 一起去哪里

One solution in their docs here is for CURL.。但涉及在必须将请求发送回他们以获取文件之后下载 txt。有没有办法在 Python 中做到这一点，不涉及我必须返回它们的 txt？我只想将我的字符串列表作为音频文件返回。

（我把我的实际密钥放在上面的块中。我只是不打算在这里分享。）

【问题讨论】：

docs offer no solutions for Python?
我看到了这个。我的意思是在 OP 中，我发布的链接中没有 Python 等效项。我不明白这个链接是什么，（代码）。我不明白我的 API 密钥去了哪里。也许这就是我所需要的。这段代码在哪里看到 API？看了一整天，我还是找不到在任何地方使用这些东西的方法。
当您说 API 密钥时，我假设您的意思是您在设置 Google Cloud 时设置的 API 密钥对吗？ full set up from the beginning 可能值得一读。您以 JSON 格式下载的 API 密钥是您在环境中设置为 GOOGLE_APPLICATION_CREDENTIALS 的内容（请参见步骤 2）。然后，他们会提供有关如何正确设置 Python environment 的进一步说明。
正如 aug 所指出的，他们提供的链接上有一个 Python 快速入门。 Python 快速入门提供与您链接到的 CURL 示例等效的功能。此外，正如 aug 所提到的，您需要使用服务帐户，而不是 API 密钥。
Eric，这些文档是你写的？恭敬地，它们非常不透明和令人困惑。很难找到。就好像所有东西都有 3 个诱饵版本。与我注册的地方无关。昨天花了 10 个小时试图让 Python 完成 CURL 命令所做的事情。今天，花了大约 8 个小时试图弄清楚在哪里输入语音名称，并且它与 language_code 不同。你否决了我的问题？

标签： python api text-to-speech

【解决方案1】：

为 JSON 文件配置 Python 应用并安装客户端库

创建服务帐号
使用服务帐户here 创建服务帐户密钥
JSON 文件下载并安全保存
在您的 Python 应用程序中包含 Google 应用程序凭据
安装库：pip install --upgrade google-cloud-texttospeech

使用 Google 找到的 Python 示例： https://cloud.google.com/text-to-speech/docs/reference/libraries 注意：在 Google 的示例中，它没有正确包含 name 参数。和 https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/texttospeech/cloud-client/quickstart.py

以下是使用 google 应用凭据和女性 Wavenet 语音修改的示例。

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/yourproject-12345.json"

from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Do no evil!")

# Build the voice request, select the language code ("en-US") 
# ****** the NAME
# and the ssml voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
    language_code='en-US',
    name='en-US-Wavenet-C',
    ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)

# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
    audio_encoding=texttospeech.enums.AudioEncoding.MP3)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)

# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

声音、姓名、语言代码、SSML 性别等

声音列表：https://cloud.google.com/text-to-speech/docs/voices

在上面的代码示例中，我将 Google 示例代码中的语音更改为包含 name 参数，并使用 Wavenet 语音（大大改进但更昂贵，16 美元/百万个字符）和 SSML Gender 为 FEMALE。

voice = texttospeech.types.VoiceSelectionParams(
        language_code='en-US',
        name='en-US-Wavenet-C',
        ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)

【讨论】：

这里的语音选择部分是“language_code='en-US', ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL)”，你知道我怎样才能得到'en-US-Wavenet -F' Google TTS 示例页面上的语音？
找到他们的名单。奇怪的原因是无论声音的名称如何，相同语言代码的所有声音听起来都相同cloud.google.com/text-to-speech/docs/voices。换句话说，“en-US”总是返回完全相同的声音。如果我将其更改为“en-US-Wavenet-F”或其他任何内容，则没有任何变化。
知道了。对于任何人可能的未来参考，在 language_code 之后添加 name='en-US-Wavenet-F'，在该行下方以获得我一直试图获得的内容。
如何使用此示例调用外部 JSON 表示，包括 ssml 和语音选择参数，如 cloud.google.com/text-to-speech/docs/ssml#tips_for_using_ssml ？
为什么 speech_client.list_voices() 没有用 Wavenet 列出声音？

【解决方案2】：

在我打开的 150 个 Google 文档页面中找到了答案并丢失了链接。

#(Since I'm using a Jupyter Notebook)
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/Path/to/JSON/file/jsonfile.json"
from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World!")

# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
    language_code='en-US',
    ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL)

# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
    audio_encoding=texttospeech.enums.AudioEncoding.MP3)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)

# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

我的耗时追求是尝试使用 Python 通过 JSON 发送请求，但这似乎是通过自己的模块，效果很好。请注意，默认语音性别是“中性”。

【讨论】：

【解决方案3】：

如果您想避免使用 google Python API，您可以这样做：

import requests 
import json

url = "https://texttospeech.googleapis.com/v1beta1/text:synthesize"

text = "This is a text"

data = {
        "input": {"text": text},
        "voice": {"name":  "fr-FR-Wavenet-A", "languageCode": "fr-FR"},
        "audioConfig": {"audioEncoding": "MP3"}
      };

headers = {"content-type": "application/json", "X-Goog-Api-Key": "YOUR_API_KEY" }

r = requests.post(url=url, json=data, headers=headers)
content = json.loads(r.content)

这与您所做的类似，但您需要包含您的 API 密钥。

【讨论】：