PCM 32 位和 PCM 16 位的音频编码转换问题答案

【问题标题】：Audio Encoding conversion problems with PCM 32-bit yo PCM 16-bitPCM 32 位和 PCM 16 位的音频编码转换问题
【发布时间】：2016-06-08 04:25:52
【问题描述】：

我在 Universal Windows App 中使用 C# 来编写 Watson Speech-to-text 服务。现在我不使用 Watson 服务，而是写入文件，然后在 Audacity 中读取它以确认它的格式正确，因为 Watson 服务没有向我返回正确的响应，下面解释了原因。

由于某种原因，当我创建 16 位 PCM 编码属性和读取缓冲区时，我只能将数据读取为 32 位 PCM，而且它运行良好，但如果我在 16 位 PCM 中读取它是在慢动作中，所有的语音基本上都被破坏了。

我真的不知道从 32 位转换到 16 位究竟需要做什么，但这是我的 C# 应用程序中的内容：

//Creating PCM Encoding properties
var pcmEncoding = AudioEncodingProperties.CreatePcm(16000, 1, 16);
var result = await AudioGraph.CreateAsync(
    new AudioGraphSettings(AudioRenderCategory.Speech)
    {
        DesiredRenderDeviceAudioProcessing = AudioProcessing.Raw,
        AudioRenderCategory = AudioRenderCategory.Speech,
        EncodingProperties = pcmEncoding
    }
);
graph = result.Graph;

//Initialize microphone
var microphone = await DeviceInformation.CreateFromIdAsync(MediaDevice.GetDefaultAudioCaptureId(AudioDeviceRole.Default));
var micInputResult = await graph.CreateDeviceInputNodeAsync(MediaCategory.Speech, pcmEncoding, microphone);

//Create frame output node
frameOutputNode = graph.CreateFrameOutputNode(pcmEncoding);

//Callback function to fire when buffer is filled with data
graph.QuantumProcessed += (s, a) => ProcessFrameOutput(frameOutputNode.GetFrame());
frameOutputNode.Start();

//Make the microphone write into the frame node
micInputResult.DeviceInputNode.AddOutgoingConnection(frameOutputNode);
micInputResult.DeviceInputNode.Start();

graph.Start();

初始化步骤在这个阶段完成。现在，实际上只有当我使用具有以下功能的 32 位 PCM 编码（注释掉的是导致慢动作语音输出的 PCM 16 位代码）时，才能从缓冲区读取并写入文件：

private void ProcessFrameOutput(AudioFrame frame)
{
    //Making a copy of the audio frame buffer
    var audioBuffer = frame.LockBuffer(AudioBufferAccessMode.Read);
    var buffer = Windows.Storage.Streams.Buffer.CreateCopyFromMemoryBuffer(audioBuffer);
    buffer.Length = audioBuffer.Length;

    using (var dataReader = DataReader.FromBuffer(buffer))
    {
        dataReader.ByteOrder = ByteOrder.LittleEndian;

        byte[] byteData = new byte[buffer.Length];
        int pos = 0;

        while (dataReader.UnconsumedBufferLength > 0)
        {
            /*Reading Float -> Int 32*/
            /*With this code I can import raw wav file into the Audacity
              using Signed 32-bit PCM Encoding, and it is working well*/
            var singleTmp = dataReader.ReadSingle();
            var int32Tmp = (Int32)(singleTmp * Int32.MaxValue);
            byte[] chunkBytes = BitConverter.GetBytes(int32Tmp);
            byteData[pos++] = chunkBytes[0];
            byteData[pos++] = chunkBytes[1];
            byteData[pos++] = chunkBytes[2];
            byteData[pos++] = chunkBytes[3];

            /*Reading Float -> Int 16 (Slow Motion)*/
            /*With this code I can import raw wav file into the Audacity
              using Signed 16-bit PCM Encoding, but when I play it, it's in
              a slow motion*/
            //var singleTmp = dataReader.ReadSingle();
            //var int16Tmp = (Int16)(singleTmp * Int16.MaxValue);
            //byte[] chunkBytes = BitConverter.GetBytes(int16Tmp);
            //byteData[pos++] = chunkBytes[0];
            //byteData[pos++] = chunkBytes[1];
        }

        WriteBytesToFile(byteData);
    }
}

谁能想到发生这种情况的原因？是不是因为 Int32 PCM 尺寸更大，而当我使用 Int16 时，它会扩展它并使声音更长？还是我没有正确采样？

注意：我尝试直接从缓冲区中读取字节，然后将其用作原始数据，但它没有以这种方式编码为 PCM。直接从缓冲区读取 Int16/32 也不起作用。在上面的例子中，我只使用了 Frame Output 节点。如果我创建一个自动写入原始文件的文件输出节点，它与 16 位 PCM 一样工作得非常好，所以我的回调函数有问题，导致它处于慢动作状态。

谢谢

【问题讨论】：

未来，如果您提供损坏的原始数据样本，您的问题将更容易解决。

标签： c# speech-recognition universal

【解决方案1】：

//Creating PCM Encoding properties
var pcmEncoding = AudioEncodingProperties.CreatePcm(16000, 1, 16);
var result = await AudioGraph.CreateAsync(
    new AudioGraphSettings(AudioRenderCategory.Speech)
    {
        DesiredRenderDeviceAudioProcessing = AudioProcessing.Raw,
        AudioRenderCategory = AudioRenderCategory.Speech,
        EncodingProperties = pcmEncoding
    }
);
graph = result.Graph;

pcmEncoding 在这里没有多大意义，因为 AudioGraph 只支持浮点编码。

        byte[] byteData = new byte[buffer.Length];

它应该是buffer.Length / 2，因为您将每个样本 4 个字节的浮点数据转换为每个样本 2 个字节的 int16 数据

        /*Reading Float -> Int 16 (Slow Motion)*/
        /*With this code I can import raw wav file into the Audacity
          using Signed 16-bit PCM Encoding, but when I play it, it's in
          a slow motion*/
        var singleTmp = dataReader.ReadSingle();
        var int16Tmp = (Int16)(singleTmp * Int16.MaxValue);
        byte[] chunkBytes = BitConverter.GetBytes(int16Tmp);
        byteData[pos++] = chunkBytes[0];
        byteData[pos++] = chunkBytes[1];

这是正确的代码，它应该可以工作。您的“慢动作”很可能与您之前错误设置的缓冲区大小有关。

我必须承认微软需要有人来审查他们臃肿的 API

【讨论】：

感谢您的评论。在我的 Int32 代码下方，我实际上是以相同的方式进行转换。抱歉，它只是被注释掉以表明它是慢动作。更新 1：我将音频文件加快了 2 倍，它听起来不像原始声音 - 看起来它缺少一些音频块。
我花了 2 天时间试图弄清楚这一点，只需添加“/ 2”就可以了。我刚刚使用 Watson 服务对其进行了测试，并且效果很好。非常感谢！