【问题标题】:Why do MFCC extraction libs return different values?为什么 MFCC 提取库返回不同的值?
【发布时间】:2019-02-06 07:04:08
【问题描述】:

我正在使用两个不同的库提取 MFCC 特征:

  • python_speech_features 库
  • BOB 库

但是两者的输出不同,甚至形状也不一样。这正常吗?还是我缺少一个参数?

我的代码的相关部分如下:

import bob.ap
import numpy as np
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc, delta, logfbank

def bob_extract_features(audio, rate):
    #get MFCC
    rate              = 8000  # rate
    win_length_ms     = 30    # The window length of the cepstral analysis in milliseconds
    win_shift_ms      = 10    # The window shift of the cepstral analysis in milliseconds
    n_filters         = 26    # The number of filter bands
    n_ceps            = 13    # The number of cepstral coefficients
    f_min             = 0.    # The minimal frequency of the filter bank
    f_max             = 4000. # The maximal frequency of the filter bank
    delta_win         = 2     # The integer delta value used for computing the first and second order derivatives
    pre_emphasis_coef = 0.97  # The coefficient used for the pre-emphasis
    dct_norm          = True  # A factor by which the cepstral coefficients are multiplied
    mel_scale         = True  # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale

    c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min,
                    f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm)
    c.with_delta       = False
    c.with_delta_delta = False
    c.with_energy      = False

    signal = np.cast['float'](audio)           # vector should be in **float**
    example_mfcc = c(signal)                   # mfcc + mfcc' + mfcc''
    return  example_mfcc


def psf_extract_features(audio, rate):
    signal = np.cast['float'](audio) #vector should be in **float**
    mfcc_feature = mfcc(signal, rate, winlen = 0.03, winstep = 0.01, numcep = 13,
                        nfilt = 26, nfft = 512,appendEnergy = False)

    #mfcc_feature = preprocessing.scale(mfcc_feature)
    deltas       = delta(mfcc_feature, 2)
    fbank_feat   = logfbank(audio, rate)
    combined     = np.hstack((mfcc_feature, deltas))
    return mfcc_feature



track = 'test-sample.wav'
rate, audio = read(track)

features1 = psf_extract_features(audio, rate)
features2 = bob_extract_features(audio, rate)

print("--------------------------------------------")
t = (features1 == features2)
print(t)

【问题讨论】:

    标签: python voice-recognition voice speech mfcc


    【解决方案1】:

    您是否尝试过以一定的容忍度比较两者?我相信这两个 MFCC 是浮点数数组,测试完全相等可能并不明智。尝试使用带有一定容差的numpy.testing.assert_allclose,并确定容差是否足够好。

    尽管如此,我想念你说即使形状不匹配,我也没有使用 bob.ap 的经验来自信地对此发表评论。但是,由于窗口原因,某些库通常会在输入数组的开头或结尾用零填充输入,如果其中一个以不同的方式执行,这可能是原因。

    【讨论】:

    • 不是答案的一部分,但是,如果您正在寻找 MFCC 的库,librosa 也可能是您的选择。
    【解决方案2】:

    但是两者的输出不同,甚至形状也不一样。这正常吗?

    是的,算法有很多种,每种实现都有自己的风格

    还是有我遗漏的参数?

    这不仅仅是参数,还有算法差异,如窗口形状(汉明与汉宁)、mel 过滤器的形状、mel 过滤器的开始、mel 过滤器的标准化、提升、dct 风味等等。

    如果你想要相同的结果只使用单个库进行提取,同步它们是非常没有希望的。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-02-10
      • 2013-05-09
      • 2019-12-26
      • 2017-11-05
      • 2014-11-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多