【问题标题】:How to receive SIP audio and send wav stream to Google Speech recognition API in node?如何接收 SIP 音频并将 wav 流发送到节点中的 Google 语音识别 API?
【发布时间】:2019-11-09 02:24:49
【问题描述】:

到目前为止,我一直在尝试sipster,但它有一些令人生畏的限制 (e.g. lack of configurability)。任何想法如何做到这一点?也许使用像asterisk-manager 这样的星号节点包装器?

在一些更详细的基本思想是

  • 运行虚拟 sip 客户端,可以接收 SIP 连接
  • 将该连接中的音频转换为常规 wav 格式
  • 将 wav 音频流式传输到 Google 语音 API
  • 有其他方法可以通过节点作用于 sip 流,例如播放声音

【问题讨论】:

  • sipster 可配置的,您可以将 pjsua2 配置选项传递给init()。这些选项可以在 pjsua2 文档中找到,它们没有在 sipster 文档中列出,因为有很多并且会重复文档。
  • 假设你的“波流”在谷歌文档中意味着“流连续”,你需要在 googl 端走 GRPC / proto-buffers 的路由。你应该查看你的 api 以访问音频缓冲区的字节...假设那些编码的 fmt && 比特率与语音 api 输入兼容,您只需 ArrayCopy.myAudioBytes() && 写入您为语音打开的 goog.api.channel ...@987654324 @

标签: node.js audio speech-recognition asterisk sip


【解决方案1】:

这篇文章已经很老了,看起来 Google 方面的情况已经有了很大的改进,无论是语音处理器本身,它变得越来越准确,还有 Node.js 方面,如 @987654321 @ 与 Google Cloud Speech API 的接口会定期更新。

根据@arheops 的建议,您可能想看看 Asterisk 的 EAGI 和 Node.js,以便获得由 Google 转录的音频样本。

以下 EAGI bash 脚本可能在这方面有所帮助(详细说明可用 here):

#!/bin/bash

# Read all variables sent by Asterisk store them as an array, but won't use them
declare -a array
while read -e ARG && [ "$ARG" ] ; do
        array=(` echo $ARG | sed -e 's/://'`)
        export ${array[0]}=${array[1]}
done

# First argument is language
case "$1" in
"fr-FR" | "en-GB" | "es-ES" | "it-IT" )
  LANG=$1
  ;;
*)
  LANG=en-US
  ;;
esac

NODECMD=$(which node)

# Second argument is a timeout, in seconds. The duration to wait for voice input form the caller.
DURATION=$2
SAMPLE_RATE=8000
SAMPLE_SIZE_BYTES=2
let "SAMPLE_SIZE_BITS = SAMPLE_SIZE_BYTES * 8"

# EAGI_AUDIO_FORMAT is an asterisk variable that specifies the sample rate and
# sample size (usually 16 bits per sample) of the caller's voice stream.
# Depending on the codec used here, you can get sample rate values ranging from
# 8000Hz (e.g. G.711 uLaw) to 48000Hz (e.g. opus).
echo "GET VARIABLE EAGI_AUDIO_FORMAT"
read line
EAGI_AUDIO_FORMAT=$(echo $line | sed -r 's/.*\((.*)\).*/\1/')

# 5 seconds of audio input are gathered in ( SAMPLE_RATE / sample_size ) * 5 bytes
# - SAMPLE_RATE is set as per EAGI_AUDIO_FORMAT
# - sample_size is set to 2 (16 bits per sample)
#
# We don't do much here to adapt the sample rate, this code should be improved
case "${EAGI_AUDIO_FORMAT}" in
"slin48")
  SAMPLE_RATE=48000
  ;;
*)
  SAMPLE_RATE=8000
  ;;
esac

# Temporary file to store raw audio samples
AUDIO_FILE=/tmp/audio-${SAMPLE_SIZE_BITS}_bits-${SAMPLE_RATE}_hz-${DURATION}_sec.raw

# We use `dd` here to copy the raw audio samples we're getting from file
# descriptor 3 (this is the Enhanced version in EAGI) to the temporary file.
# The number of blocks to copy is a function of the DURATION to record audio and
# the sample rate. SAMPLE_SIZE_BYTES cannot be changed as it is assumed that each
# sample is 16 bits in size.
let "COUNT = SAMPLE_RATE * SAMPLE_SIZE_BYTES * DURATION"
# By default, dd stores blocks of 512 bytes
let "BLOCKS = COUNT / 512"
echo "exec noop \"Number of bytes to store : ${COUNT}\""
read line

echo "exec noop \"Number of dd blocks to store : ${BLOCKS}\""
read line

echo "exec playback \"beep\""
read line

dd if=/dev/fd/3 count=${BLOCKS} of=${AUDIO_FILE}
echo "exec noop \"File saved !\""

echo "exec noop \"AUDIO_FILE : ${AUDIO_FILE}\""
read line
echo "exec noop \"SAMPLE_RATE : ${SAMPLE_RATE}\""
read line
echo "exec noop \"LANG : ${LANG}\""
read line

# Submit audio to Google Cloud Speech API and get the result
export GOOGLE_APPLICATION_CREDENTIALS=/usr/local/node_programs/service_account_file.json
RES=$(${NODECMD} /usr/local/node_programs/nodejs-speech/samples/recognize.js sync ${AUDIO_FILE} -e LINEAR16 -r ${SAMPLE_RATE} -l ${LANG})

# clean up result returned from recognize.js :
# - remove new lines
# - remove 'Transcription :' header
RES=$(echo $RES | tr -d '\n' | sed -e 's/Transcription: \(.*$\)/\1/')

# Set GOOGLE_TRANSCRIPTION_RESULT variable, remove temporary file
# and continue dialplan execution
echo "set variable GOOGLE_TRANSCRIPTION_RESULT \"${RES}\""
read line

/bin/rm -f ${AUDIO_FILE}

exit 0

希望这会有所帮助!

【讨论】:

    【解决方案2】:

    最简单的方法 - 使用星号 EAGI 界面并将声音从标准输入/流读取到谷歌。

    然而,目前谷歌语音识别 api 并不稳定。有些日子它只是停止工作,然后第二天开始工作。

    【讨论】:

    • 我试过了,但是没有用。可以分享示例代码吗?
    猜你喜欢
    • 2017-10-20
    • 2011-08-21
    • 2014-05-10
    • 2013-03-30
    • 2013-06-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多