快速将字节数组转换为音频数据的短数组答案

【问题标题】：Fast conversion byte array to short array of audio data快速将字节数组转换为音频数据的短数组
【发布时间】：2011-10-13 09:26:05
【问题描述】：

我需要最快的方法将字节数组转换为音频数据的短数组。

音频数据字节数组包含来自以这种方式放置的两个音频通道的数据：

C1C1C2C2 C1C1C2C2 C1C1C2C2 ...

where

C1C1 - two bytes of first channel

C2C2 - two bytes of second channel

目前我使用这样的算法，但我觉得有更好的方法来执行这个任务。

byte[] rawData = //from audio device
short[] shorts = new short[rawData.Length / 2];
short[] channel1 = new short[rawData.Length / 4];
short[] channel2 = new short[rawData.Length / 4];
System.Buffer.BlockCopy(rawData, 0, shorts, 0, rawData.Length);
for (int i = 0, j = 0; i < shorts.Length; i+=2, ++j)
{
    channel1[j] = shorts[i];
    channel2[j] = shorts[i+1];
}

【问题讨论】：

鉴于数据的交错性质，我觉得这很好。您可以通过在shorts 上编写一个包装器以提供虚拟的channel1 和channel2 来节省一些内存复制，但内存复制速度很快（除非您有大量数据，然后查看包装选项）。跨度>
我要补充一点，如果你喜欢危险的生活，你可以使用struct 技巧从这里：stackoverflow.com/q/621493/613130 跳过 BlockCopy。
@anth 为什么需要最快的方式？这段代码已经可以以许多倍的实时速度运行。
如果性能如此重要，我会认真考虑重用所有缓冲区。特别是如果它们大到足以落在 LOH 上（我认为这发生在 >85kB，但这是一个实现细节）。
您还应该考虑字节序问题。此处发布的一些代码假定本机字节序，其他代码假定固定字节序。哪一个是正确的取决于您输入数据的字节顺序。

标签： c#

【解决方案1】：

您可以省略复制缓冲区：

byte[] rawData = //from audio device
short[] channel1 = new short[rawData.Length / 4];
short[] channel2 = new short[rawData.Length / 4];
for (int i = 0, j = 0; i < rawData.Length; i+=4, ++j)
{
    channel1[j] = (short)(((ushort)rawData[i + 1]) << 8 | (ushort)rawData[i]);
    channel2[j] = (short)(((ushort)rawData[i + 3]) << 8 | (ushort)rawData[i + 2]);
}

为了更快地获得循环，您可以查看Task Parralel Library, exspecially Parallel.For：

[编辑]

System.Threading.Tasks.Parallel.For( 0, shorts.Length/2, ( i ) =>
{
    channel1[i] = shorts[i*2];
    channel2[i] = shorts[i*2+1];
} );

[/EDIT]

另一种方法是循环展开，但我认为 TPL 也会提升这一点。

【讨论】：

这真的会加快速度吗？
我不知道。我认为您必须进行一些测量……但这可以节省内存。
OP 要求“最快的方式”。
@chibacity 实际上只有一种方法可以知道它是否最快：实际测量。
@MPelletier 显然。不过，我不是提供它作为答案的人。

【解决方案2】：

您可以使用不安全的代码来避免数组寻址或位移。但正如PVitt 在新 PC 上所说，如果您的数据大小很重要，您最好使用标准托管代码和 TPL。

short[] channel1 = new short[rawData.Length / 4];
short[] channel2 = new short[rawData.Length / 4];

fixed(byte* pRawData = rawData)
fixed(short* pChannel1 = channel1)
fixed(short* pChannel2 = channel2)
{
    byte* end = pRawData + rawData.Length;
    while(pRawData < end)
    {
        (*(pChannel1++)) = *((short*)pRawData);
        pRawData += sizeof(short);
        (*(pChannel2++)) = *((short*)pRawData);
        pRawData += sizeof(short);
    }
}

与所有优化问题一样，您需要仔细考虑时间，特别注意您的缓冲区分配，channel1 和 channel2 可以是自动增长的静态（大）缓冲区并且您只能使用第 n 个第一个字节。您将能够跳过 2 个大数组分配对于此函数的每次执行。并且会减少 GC 的工作量（当时机很重要时总是更好）

正如CodeInChaos 所指出的，如果您的数据不在正确的字节顺序，您需要进行转换，例如在 big 之间进行转换小端假设 8 位原子元素，代码看起来像：

short[] channel1 = new short[rawData.Length / 4];
short[] channel2 = new short[rawData.Length / 4];

fixed(byte* pRawData = rawData)
fixed(byte* pChannel1 = (byte*)channel1)
fixed(byte* pChannel2 = (byte*)channel2)
{
    byte* end = pRawData + rawData.Length;
    byte* pChannel1High = pChannel1 + 1;
    byte* pChannel2High = pChannel2 + 1;

    while(pRawData < end)
    {
        *pChannel1High = *pRawData;
        pChannel1High += 2 * sizeof(short);

        *pChannel1 = *pRawData;
        pChannel1 += 2 * sizeof(short);

        *pChannel2High = *pRawData;
        pChannel2High += 2 * sizeof(short);

        *pChannel2 = *pRawData;
        pChannel2 += 2 * sizeof(short);
    }
}

我没有使用实际的编译器编译这篇文章中的任何代码，所以如果您发现错误，请随时编辑它。

【讨论】：

如果它们是非常小的数组（85,000 字节或更多），它们最终会出现在 LOH（大对象堆）上——这在 GC 方面很便宜。
@chiba 根据我的经验，在 LOH 上收集东西非常昂贵，因为 LOH 上的对象仅在 Gen2 GC 期间被收集。因此，减少大量分配通常会带来很大的性能提升。
@CodeInChaos 收集在 LOH 上很便宜 - 没有压缩和生成管理。为什么收藏会很贵？
正如我所说，它需要 Gen2 GC。在非平凡的应用程序中，这些是昂贵的，因为它们需要爬取所有托管对象，而不仅仅是新对象。另一个问题是，由于 Gen2 集合比 Gen0/1 集合少，内存使用量会大幅增加。
@CodeInChaos 也许我在这里误解了一些东西。尽管 LOH 对象报告在第 2 代中，但它们实际上与 SOH 中报告为第 2 代的对象“在”不同的管理结构中。它们与第 2 代扫描同时得到管理，但它们是完全独立的并且管理方式完全不同。

【解决方案3】：

您可以自己进行基准测试！记得使用Release Mode并在没有Debug的情况下运行（Ctrl+F5）

class Program
{
    [StructLayout(LayoutKind.Explicit)]
    struct UnionArray
    {
        [FieldOffset(0)]
        public byte[] Bytes;

        [FieldOffset(0)]
        public short[] Shorts;
    }

    unsafe static void Main(string[] args)
    {
        Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;

        byte[] rawData = new byte[10000000];
        new Random().NextBytes(rawData);

        Stopwatch sw1 = Stopwatch.StartNew();

        short[] shorts = new short[rawData.Length / 2];
        short[] channel1 = new short[rawData.Length / 4];
        short[] channel2 = new short[rawData.Length / 4];
        System.Buffer.BlockCopy(rawData, 0, shorts, 0, rawData.Length);
        for (int i = 0, j = 0; i < shorts.Length; i += 2, ++j)
        {
            channel1[j] = shorts[i];
            channel2[j] = shorts[i + 1];
        }

        sw1.Stop();

        Stopwatch sw2 = Stopwatch.StartNew();

        short[] channel1b = new short[rawData.Length / 4];
        short[] channel2b = new short[rawData.Length / 4];

        for (int i = 0, j = 0; i < rawData.Length; i += 4, ++j)
        {
            channel1b[j] = BitConverter.ToInt16(rawData, i);
            channel2b[j] = BitConverter.ToInt16(rawData, i + 2);
        }

        sw2.Stop();

        Stopwatch sw3 = Stopwatch.StartNew();

        short[] shortsc = new UnionArray { Bytes = rawData }.Shorts;
        short[] channel1c = new short[rawData.Length / 4];
        short[] channel2c = new short[rawData.Length / 4];

        for (int i = 0, j = 0; i < shorts.Length; i += 2, ++j)
        {
            channel1c[j] = shortsc[i];
            channel2c[j] = shortsc[i + 1];
        }

        sw3.Stop();

        Stopwatch sw4 = Stopwatch.StartNew();

        short[] channel1d = new short[rawData.Length / 4];
        short[] channel2d = new short[rawData.Length / 4];

        for (int i = 0, j = 0; i < rawData.Length; i += 4, ++j)
        {
            channel1d[j] = (short)((short)(rawData[i + 1]) << 8 | (short)rawData[i]);
            channel2d[j] = (short)((short)(rawData[i + 3]) << 8 | (short)rawData[i + 2]);
            //Equivalent warning-less version
            //channel1d[j] = (short)(((ushort)rawData[i + 1]) << 8 | (ushort)rawData[i]);
            //channel2d[j] = (short)(((ushort)rawData[i + 3]) << 8 | (ushort)rawData[i + 2]);

        }

        sw4.Stop();

        Stopwatch sw5 = Stopwatch.StartNew();

        short[] channel1e = new short[rawData.Length / 4];
        short[] channel2e = new short[rawData.Length / 4];

        fixed (byte* pRawData = rawData)
        fixed (short* pChannel1 = channel1e)
        fixed (short* pChannel2 = channel2e)
        {
            byte* pRawData2 = pRawData;
            short* pChannel1e = pChannel1;
            short* pChannel2e = pChannel2;

            byte* end = pRawData2 + rawData.Length;

            while (pRawData2 < end)
            {
                (*(pChannel1e++)) = *((short*)pRawData2);
                pRawData2 += sizeof(short);
                (*(pChannel2e++)) = *((short*)pRawData2);
                pRawData2 += sizeof(short);
            }
        }

        sw5.Stop();

        Stopwatch sw6 = Stopwatch.StartNew();

        short[] shortse = new short[rawData.Length / 2];
        short[] channel1f = new short[rawData.Length / 4];
        short[] channel2f = new short[rawData.Length / 4];
        System.Buffer.BlockCopy(rawData, 0, shortse, 0, rawData.Length);

        System.Threading.Tasks.Parallel.For(0, shortse.Length / 2, (i) =>
        {
            channel1f[i] = shortse[i * 2];
            channel2f[i] = shortse[i * 2 + 1];
        });

        sw6.Stop();


        if (!channel1.SequenceEqual(channel1b) || !channel1.SequenceEqual(channel1c) || !channel1.SequenceEqual(channel1d) || !channel1.SequenceEqual(channel1e) || !channel1.SequenceEqual(channel1f))
        {
            throw new Exception();
        }

        if (!channel2.SequenceEqual(channel2b) || !channel2.SequenceEqual(channel2c) || !channel2.SequenceEqual(channel2d) || !channel2.SequenceEqual(channel2e) || !channel2.SequenceEqual(channel2f))
        {
            throw new Exception();
        }

        Console.WriteLine("Original: {0}ms", sw1.ElapsedMilliseconds);
        Console.WriteLine("BitConverter: {0}ms", sw2.ElapsedMilliseconds);
        Console.WriteLine("Super-unsafe struct: {0}ms", sw3.ElapsedMilliseconds);
        Console.WriteLine("PVitt shifts: {0}ms", sw4.ElapsedMilliseconds);
        Console.WriteLine("unsafe VirtualBlackFox: {0}ms", sw5.ElapsedMilliseconds);
        Console.WriteLine("TPL: {0}ms", sw6.ElapsedMilliseconds);
        Console.ReadKey();
        return;
    }
}

在 x86 上最快的是 VirtualBlackFox 的不安全代码，其次是 C# unsafe value type array to byte array conversions 的“超级不安全”struct“trick”，第三是 PVitt。
在 x64 上最快的是 VirtualBlackFox 的不安全代码，第二个 PVitt。

【讨论】：

VirtualBlackFox，然后是 PVitt，一旦预热，在我的机器上肯定会更快。
@chibacity 您是否在没有调试器的情况下以发布模式运行程序？ (CTRL-F5)。差异很大。但是，是的，我会说 PVitt 是“最佳”“安全”解决方案。
@xantos 是的，我运行正常 :) 更新了我的评论，VirtualBlackFox 在添加到测试套件后最快。
@Xantos 不，在您的原始版本中，您将您的版本设置为最快，它实际上是第三落后 PVitt ：P
@chibacity 在我的机器上，不安全的结构技巧比 PVitt 更快。它可能取决于数组大小。