如何将音频文件从 Firebase 存储发送到 Google Speech-to-Text？答案

【问题标题】：How to send an audio file to Google Speech-to-Text from Firebase Storage?如何将音频文件从 Firebase 存储发送到 Google Speech-to-Text？
【发布时间】：2020-03-17 21:36:20
【问题描述】：

我正在尝试使用 Firebase Cloud Functions 将一个小音频文件（几秒钟）从 Firebase 存储发送到 Google Cloud Speech-to-Text。 documentation 表示将此同步代码用于小音频文件：

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const gcsUri = 'gs://my-bucket/audio.raw';
// const encoding = 'Encoding of the audio file, e.g. LINEAR16';
// const sampleRateHertz = 16000;
// const languageCode = 'BCP-47 language code, e.g. en-US';

const config = {
  encoding: encoding,
  sampleRateHertz: sampleRateHertz,
  languageCode: languageCode,
};
const audio = {
  uri: gcsUri,
};

const request = {
  config: config,
  audio: audio,
};

// Detects speech in the audio file
const [response] = await client.recognize(request);
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
  .join('\n');
console.log(`Transcription: `, transcription);

该代码无法运行，因为它有await 而没有async。

此代码的另一个问题是它无法捕获错误。修复这些问题，并放入 Firebase Cloud Functions 触发器，我有以下代码：

exports.Google_Speech_to_Text = functions.firestore.document('Users/{userID}/Pronunciation_Test/downloadURL').onUpdate((change, context) => {
    return async function syncRecognizeGCS() {
      // [START speech_transcribe_sync_gcs]
      // Imports the Google Cloud client library
      const speech = require('@google-cloud/speech');

      // Creates a client
      const client = new speech.SpeechClient();

      const gcsUri = 'gs://my-app.appspot.com/my-file';
      const encoding = 'Opus';
      const sampleRateHertz = 48000;
      const languageCode = 'en-US';

      const config = {
        encoding: encoding,
        sampleRateHertz: sampleRateHertz,
        languageCode: languageCode,
      };
      const audio = {
        uri: gcsUri,
      };

      const request = {
        config: config,
        audio: audio,
      };

      // Detects speech in the audio file
      const [response] = await client.recognize(request)
      .catch((err) => { console.error(err); });

      const transcription = response.results
      .map(result => result.alternatives[0].transcript)
      .join('\n');
      console.log(`Transcription: `, transcription);
      // [END speech_transcribe_sync_gcs]
    }

  }); // close Google_Speech_to_Text

函数执行，返回ok，没有别的：

没有错误信息。我没有发现 Storage 中的文件有任何问题：

我尝试了一个不同的文件，这次是mp3。结果相同，只是函数执行时间为 17 毫秒，因为文件更小。

我无法确定mediaDevices.getUserMedia() 在 Chrome 中使用的音频编码和采样赫兹速率。这个blog post 表示音频编码为Opus，采样率为48000。有时我收到错误 INVALID_ARGUMENT: Invalid recognition 'config': bad encoding.. documentation 说 Your audio data might not be encoded correctly or is encoded with a codec different than what you've declared in the RecognitionConfig. 可以将 encoding 和 sampleRateHertz 留空，Google Speech-to-Text 可以解决吗？

有什么建议吗？

【问题讨论】：

标签： google-cloud-functions firebase-storage google-cloud-speech

【解决方案1】：

问题在于 Google 提供的代码无法捕获错误。当我重构代码以使用 Promise 而不是 await 时，我收到了一条错误消息。

exports.Google_Speech_to_Text = functions.firestore.document('Users/{userID}/Pronunciation_Test/downloadURL').onUpdate((change, context) => {
        // Imports the Google Cloud client library
        const speech = require('@google-cloud/speech');

        // Creates a client
        const client = new speech.SpeechClient();

        const gcsUri = 'gs://my-app.appspot.com/my-file';
        const encoding = 'Opus';
        const sampleRateHertz = 48000;
        const languageCode = 'en-US';

        const config = {
          encoding: encoding,
          sampleRateHertz: sampleRateHertz,
          languageCode: languageCode,
        };
        const audio = {
          uri: gcsUri,
        };

        const request = {
          config: config,
          audio: audio,
        };

        // Detects speech in the audio file
        return response = client.recognize(request)
        .then(function(response) {
          console.log(response);    
        })
        .catch((err) => { console.error(err); });
    });

错误是INVALID_ARGUMENT: Invalid recognition 'config': bad encoding..，换句话说，音频编码器不是Opus。

删除const encoding = 'Opus'; 行会导致错误消息encoding is not defined。

使用const encoding = ''; 会导致错误消息INVALID_ARGUMENT: Invalid recognition 'config': bad encoding..

我需要弄清楚 Chrome 现在使用什么音频编码器。太糟糕了，谷歌语音无法解决这个问题。

【讨论】：