FFT 哪些频率在哪些 bin 中？答案

【问题标题】：FFT which frequencies are in which bins?FFT 哪些频率在哪些 bin 中？
【发布时间】：2015-01-18 16:27:42
【问题描述】：

我想看看某些频率，特别是 20 - 60hz 的低低音是如何出现在一段音频中的。我将音频作为字节数组，我将其转换为短裤数组，然后通过 (short[i]/(double)short.MaxValue, 0) 转换为复数。然后我将它从 Aforge 传递给 FFT。

音频为单声道，采样率为 44100。我知道我只能在 ^2 处通过 FFT。以 4096 为例。我不明白输出箱中的频率。

如果我从 44100 采样率的音频中获取 4096 个样本。这是否意味着我需要几毫秒的音频？还是只获取一些将出现的频率？

我将 FFT 的输出添加到数组中，我的理解是，当我采用 4096 时，bin 0 将包含 0*44100/4096 = 0hz，bin 1 将包含 1*44100/4096 = 10.7666015625hz 等等在。它是否正确？还是我在这里做一些根本错误的事情？

我的目标是将频率平均在 20 到 60 赫兹之间，因此对于低音非常低、重低音的歌曲，这个数字将高于低音非常少的柔和钢琴曲。

这是我的代码。

OpenFileDialog file = new OpenFileDialog();
file.ShowDialog();
WaveFileReader reader = new WaveFileReader(file.FileName);

byte[] data = new byte[reader.Length];
reader.Read(data, 0, data.Length);

samepleRate = reader.WaveFormat.SampleRate;
bitDepth = reader.WaveFormat.BitsPerSample;
channels = reader.WaveFormat.Channels;

Console.WriteLine("audio has " + channels + " channels, a sample rate of " + samepleRate + " and bitdepth of " + bitDepth + ".");


short[] shorts = data.Select(b => (short)b).ToArray();

int size = 4096;
int window = 44100 * 10;
int y = 0;
Complex[] complexData = new Complex[size];
for (int i = window; i < window + size; i++) 
{
    Complex tmp = new Complex(shorts[i]/(double)short.MaxValue, 0);

    complexData[y] = tmp;
    y++;

}




FourierTransform.FFT(complexData, FourierTransform.Direction.Forward);


double[] arr = new double[complexData.Length];
//print out sample of conversion
for (int i = 0; i < complexData.Length; i++)
{
    arr[i] = complexData[i].Magnitude;

}

Console.Write("complete, ");


return arr;

编辑：从 DFT 改为 FFT

【问题讨论】：

嗯，你似乎在做 DFT（比 FFT 更精确），但我不知道返回的数据是如何构成的。应该在您正在使用的库的文档中。从根本上说，如果数据是线性结构化的，那你是对的，但它也可以是对数结构化的。
感谢您指出，我确实是想运行 fft，只是在玩 DFT 时复制了代码。
您基本上走在了正确的轨道上 - 您计算出的垃圾箱宽度约为 10 Hz - 请参阅 this answer 以获得更完整的解释。
就准确性而言，FFT 和 DFT 之间没有区别 - FFT 只是 DFT 的一种更有效的实现，但在数学上它们是等价的。
抱歉，大多数 FFT 算法与 DFT 一样精确。为了捍卫我来自哪里：有一些 FFT 算法会以更高的时间效率换取一些准确性。

标签： c# fft naudio audio-processing

【解决方案1】：

这是您的代码的修改版本。注意以“***”开头的 cmets。

OpenFileDialog file = new OpenFileDialog();
file.ShowDialog();
WaveFileReader reader = new WaveFileReader(file.FileName);

byte[] data = new byte[reader.Length];
reader.Read(data, 0, data.Length);

samepleRate = reader.WaveFormat.SampleRate;
bitDepth = reader.WaveFormat.BitsPerSample;
channels = reader.WaveFormat.Channels;

Console.WriteLine("audio has " + channels + " channels, a sample rate of " + samepleRate + " and bitdepth of " + bitDepth + ".");

// *** NAudio "thinks" in floats
float[] floats = new float[data.Length / sizeof(float)]
Buffer.BlockCopy(data, 0, floats, 0, data.Length);

int size = 4096;
// *** You don't have to fill the FFT buffer to get valid results.  More noisy & smaller "magnitudes", but better freq. res.
int inputSamples = samepleRate / 100; // 10ms... adjust as needed
int offset = samepleRate * 10 * channels;
int y = 0;
Complex[] complexData = new Complex[size];
// *** get a "scaling" curve to make both ends of sample region 0 but still allow full amplitude in the middle of the region.
float[] window = CalcWindowFunction(inputSamples);
for (int i = 0; i < inputSamples; i++)
{
    // *** "floats" is stored as LRLRLR interleaved data for stereo audio
    complexData[y] = new Complex(floats[i * channels + offset] * window[i], 0);
    y++;
}
// make sure the back portion of the buffer is set to all 0's
while (y < size)
{
    complexData[y] = new Complex(0, 0);
    y++;
}


// *** Consider using a DCT here instead...  It returns less "noisy" results
FourierTransform.FFT(complexData, FourierTransform.Direction.Forward);


double[] arr = new double[complexData.Length];
//print out sample of conversion
for (int i = 0; i < complexData.Length; i++)
{
    // *** I assume we don't care about phase???
    arr[i] = complexData[i].Magnitude;
}

Console.Write("complete, ");


return arr;

获得结果后，假设采样率为 44100 Hz 且大小 = 4096，则元素 2 - 4 应该是您要查找的值。有一种方法可以将它们转换为 dB，但我不记得了。

祝你好运！

【讨论】：

非常感谢。我不能让你明白我的意思。这一直困扰着我一段时间。
您在 calcWindowsFunction() 中使用什么，包中是否有可用的，或者我必须自己研究实现它。谢谢
您可以只返回一个全为 1 的浮点数组（带有“大小”元素）。有更好的窗口，但您需要从en.wikipedia.org/wiki/Window_function 中找出正确的窗口。通常，只需实现 Hamming 或 Blackman-Harris 窗口，您的状态就会很好。那里有很多示例代码。
当我更改要转换的音频部分时，例如从 10 秒（就像您的示例中设置的那样）到 20 秒或更低到 2，我得到一个充满 NaN 的数组（不是数字）。 0 工作正常，你知道这是为什么吗？
不。 “偏移量”就是它听起来的样子：数组的偏移量。只要您留在数组的范围内，更改它应该没什么大不了的。