在音频分析中绘制频谱图答案

【问题标题】：plotting spectrogram in audio analysis在音频分析中绘制频谱图
【发布时间】：2018-06-05 20:19:41
【问题描述】：

我正在使用神经网络进行语音识别。为此，我需要获取那些训练音频文件 (.wav) 的频谱图。如何在 python 中获取这些频谱图？

【问题讨论】：

查看这个python模块：Speech Recognition
@kks，我的回答对你有帮助吗？
是的......从你的回答中得到了一些很好的资源。 @Oleg Meknikov

标签： python audio tensorflow neural-network spectrogram

【解决方案1】：

有很多方法可以做到这一点。最简单的方法是查看 Kernels 在 Kaggle 竞赛 TensorFlow Speech Recognition Challenge 中提出的方法（按投票最多的排序）。 This one 特别清晰简洁，包含以下功能。输入是从 wav 文件中提取的样本的数字向量、采样率、以毫秒为单位的帧大小、以毫秒为单位的步长（步幅或跳过）大小和一个小偏移量。

from scipy.io import wavfile
from scipy import signal
import numpy as np

sample_rate, audio = wavfile.read(path_to_wav_file)

def log_specgram(audio, sample_rate, window_size=20,
                 step_size=10, eps=1e-10):
    nperseg = int(round(window_size * sample_rate / 1e3))
    noverlap = int(round(step_size * sample_rate / 1e3))
    freqs, times, spec = signal.spectrogram(audio,
                                    fs=sample_rate,
                                    window='hann',
                                    nperseg=nperseg,
                                    noverlap=noverlap,
                                    detrend=False)
    return freqs, times, np.log(spec.T.astype(np.float32) + eps)

输出在SciPy manual 中定义，但频谱图是用单调函数 (Log()) 重新调整的，它比较小的值更能抑制较大的值，同时使较大的值仍然大于较小的值价值观。这样，规格中的极值将不会主导计算。或者，可以将值限制在某个分位数，但首选对数（甚至平方根）。还有许多其他方法可以标准化频谱图的高度，即防止极端值“欺负”输出:)

freq (f) : ndarray, Array of sample frequencies.
times (t) : ndarray, Array of segment times.
spec (Sxx) : ndarray, Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.

或者，您可以从Tensorflow example on audio recognition 检查github repo 上的train.py 和models.py 代码。

Here is another thread 解释并提供有关在 Python 中构建频谱图的代码。

【讨论】：

你能帮忙看看 freqs 、 times 和 spec 返回的内容吗？我已经看过文档，但仍然感到困惑。 @Oleg Melnikov
@kks：查看输出的附加说明 :) 希望对您有所帮助。
非常感谢！只是补充一点，如果加载的 .wav 文件是立体声的，您将看到“noverlap must be less than nperseg”错误，这是一个红鲱鱼。您可以通过audio = audio[:, 0] 获得第一个通道的音频信号，然后您的log_specgram 将工作得很好。 :-) 再次感谢！

【解决方案2】：

Scipy 服务于这个目的。

import scipy
# Read the .wav file
sample_rate, data = scipy.io.wavfile.read('directory_path/file_name.wav')

# Spectrogram of .wav file
sample_freq, segment_time, spec_data = signal.spectrogram(data, sample_rate)  
# Note sample_rate and sampling frequency values are same but theoretically they are different measures

使用matplot库可视化频谱图

import matplotlib.pyplot as plt
plt.pcolormesh(segment_time, sample_freq, spec_data )
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()

【讨论】：

【解决方案3】：

您可以使用 NumPy、SciPy 和 matplotlib 包来制作频谱图。请参阅以下帖子。 http://www.frank-zalkow.de/en/code-snippets/create-audio-spectrograms-with-python.html

【讨论】：