与 Twilio“流”动词和 Websocket 一起使用时，Google Cloud Speech to Text Audio Timeout Error答案

【问题标题】：Google Cloud Speech to Text Audio Timeout Error when used with Twilio "Stream" verb and Websocket与 Twilio“流”动词和 Websocket 一起使用时，Google Cloud Speech to Text Audio Timeout Error
【发布时间】：2020-06-09 22:36:31
【问题描述】：

我目前正在尝试制作一个可以实时转录电话的系统，然后在我的命令行中显示对话。为此，我使用了一个 Twilio 电话号码，该号码在被呼叫时会发出一个 http 请求。然后使用 Flask、Ngrok 和 Websockets 编译我的服务器代码，公开我的本地端口并传输数据，使用 TwiML 动词“Stream”将音频数据流式传输到 Google Cloud Speech-Text API。到目前为止，我在 GitHub (https://github.com/twilio/media-streams/tree/master/python/realtime-transcriptions) 上使用了 Twilio 的 python 演示。

我的服务器代码：

from flask import Flask, render_template
from flask_sockets import Sockets

from SpeechClientBridge import SpeechClientBridge
from google.cloud.speech_v1 import enums
from google.cloud.speech_v1 import types

import json
import base64
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "./<KEY>.json"
HTTP_SERVER_PORT = 8080

config = types.RecognitionConfig(
    encoding=enums.RecognitionConfig.AudioEncoding.MULAW,
    sample_rate_hertz=8000,
    language_code='en-US')
streaming_config = types.StreamingRecognitionConfig(
    config=config,
    interim_results=True)

app = Flask(__name__)
sockets = Sockets(app)

@app.route('/home')
def home():
    return render_template("index.html")

@app.route('/twiml', methods=['POST'])
def return_twiml():
    print("POST TwiML")
    return render_template('streams.xml')

def on_transcription_response(response):
    if not response.results:
        return

    result = response.results[0]
    if not result.alternatives:
        return

    transcription = result.alternatives[0].transcript
    print("Transcription: " + transcription)

@sockets.route('/')
def transcript(ws):
    print("WS connection opened")
    bridge = SpeechClientBridge(
        streaming_config, 
        on_transcription_response
    )
    while not ws.closed:
        message = ws.receive()
        if message is None:
            bridge.terminate()
            break

        data = json.loads(message)
        if data["event"] in ("connected", "start"):
            print(f"Media WS: Received event '{data['event']}': {message}")
            continue
        if data["event"] == "media":
            media = data["media"]
            chunk = base64.b64decode(media["payload"])
            bridge.add_request(chunk)
        if data["event"] == "stop":
            print(f"Media WS: Received event 'stop': {message}")
            print("Stopping...")
            break

    bridge.terminate()
    print("WS connection closed")

if __name__ == '__main__':
    from gevent import pywsgi
    from geventwebsocket.handler import WebSocketHandler

    server = pywsgi.WSGIServer(('', HTTP_SERVER_PORT), app, handler_class=WebSocketHandler)
    print("Server listening on: http://localhost:" + str(HTTP_SERVER_PORT))
    server.serve_forever()

streams.xml：

<?xml version="1.0" encoding="UTF-8"?>
<Response>
     <Say> Thanks for calling!</Say>
     <Start>
        <Stream url="wss://<ngrok-URL/.ngrok.io/"/>
     </Start>
     <Pause length="40"/>
</Response>

Twilio WebHook：

http://<ngrok-URL>.ngrok.io/twiml

我在运行服务器代码然后调用 Twilio 号码时收到以下错误：

C:\Users\Max\Python\Twilio>python server.py
Server listening on: http://localhost:8080
POST TwiML
WS connection opened
Media WS: Received event 'connected': {"event":"connected","protocol":"Call","version":"0.2.0"}
Media WS: Received event 'start': {"event":"start","sequenceNumber":"1","start":{"accountSid":"AC8abc5aa74496a227d3eb489","streamSid":"MZe6245f23e2385aa2ea7b397","callSid":"CA5864313b4992607d3fe46","tracks":["inbound"],"mediaFormat":{"encoding":"audio/x-mulaw","sampleRate":8000,"channels":1}},"streamSid":"MZe6245f2397c1285aa2ea7b397"}
Exception in thread Thread-4:
Traceback (most recent call last):
  File "C:\Users\Max\AppData\Local\Programs\Python\Python37\lib\site-packages\google\api_core\grpc_helpers.py", line 96, in next
    return six.next(self._wrapped)
  File "C:\Users\Max\AppData\Local\Programs\Python\Python37\lib\site-packages\grpc\_channel.py", line 416, in __next__
    return self._next()
  File "C:\Users\Max\AppData\Local\Programs\Python\Python37\lib\site-packages\grpc\_channel.py", line 689, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.OUT_OF_RANGE
        details = "Audio Timeout Error: Long duration elapsed without audio. Audio should be sent close to real time."
        debug_error_string = "{"created":"@1591738676.565000000","description":"Error received from peer ipv6:[2a00:1450:4009:807::200a]:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Audio Timeout Error: Long duration elapsed without audio. Audio should be sent close to real time.","grpc_status":11}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Max\AppData\Local\Programs\Python\Python37\lib\threading.py", line 917, in _bootstrap_inner
    self.run()
  File "C:\Users\Max\AppData\Local\Programs\Python\Python37\lib\threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Max\Python\Twilio\SpeechClientBridge.py", line 37, in process_responses_loop
    for response in responses:
  File "C:\Users\Max\AppData\Local\Programs\Python\Python37\lib\site-packages\google\api_core\grpc_helpers.py", line 99, in next
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.OutOfRange: 400 Audio Timeout Error: Long duration elapsed without audio. Audio should be sent close to real time.

Media WS: Received event 'stop': {"event":"stop","sequenceNumber":"752","streamSid":"MZe6245f2397c125aa2ea7b397","stop":{"accountSid":"AC8abc5aa74496a60227d3eb489","callSid":"CA5842bc6431314d502607d3fe46"}}
Stopping...
WS connection closed

我无法弄清楚为什么我会收到音频超时错误？这是 Twilio 和 Google 的防火墙问题吗？编码问题？

任何帮助将不胜感激。

系统：视窗 10 Python 3.7.1 ngrok 2.3.35 烧瓶 1.1.2

【问题讨论】：

标签： python flask websocket twilio google-speech-to-text-api

【解决方案1】：

由于您的streams.xml返回的socket url“wss://

如果你的套接字以'/'开头，那么你应该重写streams.xml，见下面的例子。

<?xml version="1.0" encoding="UTF-8"?>
<Response>
     <Say> Thanks for calling!</Say>
     <Start>
        <Stream url="wss://YOUR_NGROK_ID.ngrok.io/"/>
     </Start>
     <Pause length="40"/>
</Response>

【讨论】：

感谢您的评论，瑞恩。该 URL 工作正常，因为我获得了良好的连接，您可以在命令行输出的前几个 cmets 中看到。抱歉，如果我没有使 URL 路由非常清楚。问题在于建立连接后 Google Cloud Speech-Text API 超时。

【解决方案2】：

我对此进行了一些测试，试图确定发生了什么。我在上面放了一个计时器

桥 = SpeechClientBridge( 流配置， on_transcription_response)

部分代码，发现初始化需要大约 10.9 秒。我相信谷歌 API 的超时时间为 10 秒。我尝试在我的谷歌云实例上运行它，它比我的笔记本电脑更有魅力，而且效果很好。无论是这个，还是 GCP 实例上安装了一些不同版本的库/代码等，我需要检查一下。

【讨论】：

【解决方案3】：

这与本期https://github.com/grpc/grpc/issues/4629中描述的gevent（flask_sockets使用）和grpc（google云语音使用）冲突有关解决方法是添加以下代码

import grpc.experimental.gevent as grpc_gevent
grpc_gevent.init_gevent()

【讨论】：