如何解码二进制音频数据？答案

【问题标题】：How to decode binary audio data?如何解码二进制音频数据？
【发布时间】：2020-05-14 05:49:57
【问题描述】：

我还是网络开发的新手，我正在制作一个聊天机器人，但我想先通过谷歌的文本到语音来运行响应，然后在客户端上播放声音。所以客户端向服务器发送消息 -> 服务器创建响应 -> 服务器向谷歌发送消息 -> 获取音频数据 -> 将其发送给客户端 -> 客户端播放它。我一直走到最后一步，但现在我已经超出了我的深度。

我一直在做一些谷歌搜索，似乎有很多关于从二进制数据、音频上下文等播放音频的信息，我创建了一个函数，但它不起作用。这是我所做的：

export const SendMessage: Client.Common.Footer.API.SendMessage = async message => {
    const baseRoute = process.env.REACT_APP_BASE_ROUTE;
    const port = process.env.REACT_APP_SERVER_PORT;
    const audioContext = new AudioContext();
    let audio: any;
    const url = baseRoute + ":" + port + "/ChatBot";
    console.log("%c Sending post request...", "background: #1fa67f; color: white", url, JSON.stringify(message));
    let responseJson = await fetch(url, {
        method: "POST",
        mode: "cors",
        headers: {
            Accept: "application/json",
            "Content-Type": "application/json"
        },
        body: JSON.stringify(message)
    });
    let response = await responseJson.json();
    await audioContext.decodeAudioData(
        new ArrayBuffer(response.data.audio.data),
        buffer => {
            audio = buffer;
        },
        error => console.log("===ERROR===\n", error)
    );
    const source = audioContext.createBufferSource();
    source.buffer = audio;
    source.connect(audioContext.destination);
    source.start(0);
    console.log("%c Post response:", "background: #1fa67f; color: white", url, response);
};

此函数将消息发送到服务器并取回响应消息和音频数据。我的 response.data.audio.data 中确实有某种二进制数据，但我收到一条错误消息，指出无法解码音频数据（正在触发 decodeAudioData 方法中的错误）。我知道数据是有效的，因为在我的服务器上，我使用以下代码将其转换为可以正常播放的 mp3 文件：

const writeFile = util.promisify(fs.writeFile);
await writeFile("output/TTS.mp3", response.audioContent, "binary");

我几乎不知道这里如何处理二进制数据以及可能出现的问题。我是否需要指定更多参数才能正确解码二进制数据？我怎么知道哪个？我想了解这里实际发生的情况，而不仅仅是复制粘贴一些解决方案。

编辑：

因此，似乎没有正确创建数组缓冲区。如果我运行这段代码：

    console.log(response);
    const audioBuffer = new ArrayBuffer(response.data.audio.data);
    console.log("===audioBuffer===", audioBuffer);
    audio = await audioContext.decodeAudioData(audioBuffer);

响应如下：

{message: "Message successfully sent.", status: 1, data: {…}}
    message: "Message successfully sent."
    status: 1
    data:
        message: "Sorry, I didn't understand your question, try rephrasing."
        audio:
            type: "Buffer"
            data: Array(14304)
                [0 … 9999]
                [10000 … 14303]
                length: 14304
            __proto__: Array(0)
        __proto__: Object
    __proto__: Object
__proto__: Object

但缓冲区记录如下：

===audioBuffer=== 
ArrayBuffer(0) {}
    [[Int8Array]]: Int8Array []
    [[Uint8Array]]: Uint8Array []
    [[Int16Array]]: Int16Array []
    [[Int32Array]]: Int32Array []
    byteLength: 0
__proto__: ArrayBuffer

显然 JS 不理解我的响应对象中的格式，但这就是我从 google 的文本到语音 API 中得到的。也许我从我的服务器发送错误？正如我之前所说，在我的服务器上，以下代码将该数组转换为 mp3 文件：

    const writeFile = util.promisify(fs.writeFile);
    await writeFile("output/TTS.mp3", response.audioContent, "binary");
    return response.audioContent;

其中 response.audioContent 也像这样发送到客户端：


//in index.ts
...
const app = express();
app.use(bodyParser.json());
app.use(cors(corsOptions));

app.post("/TextToSpeech", TextToSpeechController);
...
//textToSpeech.ts
export const TextToSpeechController = async (req: Req<Server.API.TextToSpeech.RequestQuery>, res: Response) => {
    let response: Server.API.TextToSpeech.ResponseBody = {
        message: null,
        status: CONSTANTS.STATUS.ERROR,
        data: undefined
    };
    try {
        console.log("===req.body===", req.body);
        if (!req.body) throw new Error("No message recieved");
        const audio = await TextToSpeech({ message: req.body.message });
        response = {
            message: "Audio file successfully created!",
            status: CONSTANTS.STATUS.SUCCESS,
            data: audio
        };
        res.send(response);
    } catch (error) {
        response = {
            message: "Error converting text to speech: " + error.message,
            status: CONSTANTS.STATUS.ERROR,
            data: undefined
        };
        res.json(response);
    }
};
...

我觉得奇怪的是，在我的服务器上，response.audioContent 记录为：

===response.audioContent=== <Buffer ff f3 44 c4 00 00 00 03 48 01 40 00 00 f0 
a3 0f fc 1a 00 11 e1 48 7f e0 e0 87 fc b8 88 40 1c 7f e0 4c 03 c1 d9 ef ff ec 
3e 4c 02 c7 88 7f ff f9 ff ff ... >

但是，在客户端，它是

audio:
            type: "Buffer"
            data: Array(14304)
                [0 … 9999]
                [10000 … 14303]
                length: 14304
            __proto__: Array(0)
        __proto__: Object

我尝试将 response.data、response.data.audio 和 response.data.audio.data 传递给 new ArrayBuffer()，但都导致相同的空缓冲区。

【问题讨论】：

标签： javascript audio

【解决方案1】：

您的代码中有几件事，您无法通过该构造函数填充ArrayBuffer。您对decodeAudioData 的调用是异步的，将导致audio 变为undefined。我建议您将对 decodeAudioData 的调用更新为较新的基于承诺的函数。

编辑： 您对 Google Text to Speech 的调用以及我发布的上一个示例的返回结果一定是在做一些奇怪的事情，因为无论我使用 mp3 还是来自 Google 的响应，一旦通过了正确的参考，它们都可以工作buffer.

您可以使其与mp3 文件而不是文本转语音一起使用的事实可能是您没有在调用google api 返回的结果中引用正确的属性。来自 api 调用的响应是 Array，因此请确保您在结果数组中引用了 0 索引（请参阅下面的 textToSpeech.js）。

如下所述的完整应用程序。

// textToSpeech.js
const textToSpeech = require('@google-cloud/text-to-speech');
const client = new textToSpeech.TextToSpeechClient();

module.exports = {
    say: async function(text) {
        const request = {
            input: { text },
            voice: { languageCode: 'en-US', ssmlGender: 'NEUTRAL' },
            audioConfig: { audioEncoding: 'MP3' },
          };
        const response = await client.synthesizeSpeech(request);
        return response[0].audioContent    
    }
}

// server.js
const express = require('express');
const path = require('path');
const app = express();
const textToSpeechService = require('./textToSpeech');

app.get('/', (req, res) => {
    res.sendFile(path.join(__dirname + '/index.html'));
});

app.get('/speech', async (req, res) => {
    const buffer = await textToSpeechService.say('hello world');
    res.json({
        status: `y'all good :)`,
        data: buffer
    })
});

app.listen(3000);

// index.html
<!DOCTYPE html>
<html>
    <script>
        async function play() {
            const audioContext = new AudioContext();
            const request = await fetch('/speech');
            const response = await request.json();
            const arr = Uint8Array.from(response.data.data)
            const audio = await audioContext.decodeAudioData(arr.buffer);
            const source = audioContext.createBufferSource();
            source.buffer = audio;
            source.connect(audioContext.destination);
            source.start(0);
        }
    </script>
    <body>
        <h1>Hello Audio</h1>
        <button onclick="play()">play</button>
    </body>
</html>

【讨论】：

非常感谢，我会尽快尝试。
更新了答案，因为不需要转换为十六进制。
我正在不正确地转换数组缓冲区，就像你说的那样，但这不是问题。似乎无论问题是什么，都是编码类型的问题。您的示例显示了如何将 mp3 文件转换为缓冲区并将其发送到客户端，我可以完成这项工作，我什至可以使用来自 google API 的数据来创建 mp3，然后从 mp3 文件创建缓冲区并将其发送给客户。这可行，但这是一个愚蠢的解决方案，会耗尽内存和处理能力来创建无意义的文件。我应该能够将数据直接发送到客户端并在那里使用它，但不知道如何......
已经用一个调用谷歌文本到语音 API 的工作示例更新了我的答案
我终于让它工作了，即使你的例子也不能工作，但它给了我足够的工作让我找到了我犯错误的地方。这是我在服务器代码中犯的一个错字，导致转换失败。非常感谢您的帮助，如果我没有一些反例可以解决，我永远不会找到这个。

【解决方案2】：

const audioBuffer = new ArrayBuffer(response.data.audio.data);
console.log("===audioBuffer===", audioBuffer);

可以试试

const audioBuffer = Buffer.from(response.data.audio);
console.log("===audioBuffer===", audioBuffer);

【讨论】：