【发布时间】:2023-11-04 21:56:02
【问题描述】:
我不是音频专家,但语音识别是我真正想探索的领域之一。
我一直在关注 tensorflow wiki 关于这个问题。 https://www.tensorflow.org/tutorials/audio/simple_audio
为了创建一个模型,我使用了我和一些朋友的个人录音。我注意到该模型只能接收大小为 31.2kb(format.wav)
的音频样本在训练完这个模型后,我成功地把它和类一起保存了,代码如下:
#Saving model
print("Saving model and label")
model.save("models/model_"+str(test_acc)+"_"+'{:%Y-%m-%d}'.format(datetime.datetime.now())+".h5")
f = open("models/labels_"+str(test_acc)+"_"+'{:%Y-%m-%d}'.format(datetime.datetime.now())+".pickle", "wb")
f.write(pickle.dumps(commands))
f.close()
现在我希望能够给出 5 分钟的音频文件或音频流,使用此模型来验证是否说出了给定的单词,有什么想法可以实现吗?
目前,我有点不知所措。以下代码目前能够对 31.2kb 进行分类,但我真正想要的是对更大的音频文件进行分类以及更好地使用模型。
import os
import pathlib
import datetime
import pickle
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import tensorflow as tf
from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras import layers
from tensorflow.keras import models
from IPython import display
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
def get_waveform_and_label(file_path):
#Label
path = tf.strings.split(file_path, os.path.sep)
label = path[-2]
#Decode Audio
audio_binary = tf.io.read_file(file_path)
audio, _ = tf.audio.decode_wav(audio_binary)
waveform = tf.squeeze(audio, axis=-1)
return waveform, label
def get_spectrogram(waveform):
# Padding for files with less than 16000 samples
zero_padding = tf.zeros([16000] - tf.shape(waveform), dtype=tf.float32)
# Concatenate audio with padding so that all audio clips will be of the
# same length
waveform = tf.cast(waveform, tf.float32)
equal_length = tf.concat([waveform, zero_padding], 0)
spectrogram = tf.signal.stft(
equal_length, frame_length=255, frame_step=128)
spectrogram = tf.abs(spectrogram)
return spectrogram
def get_spectrogram_and_label_id(audio, label):
spectrogram = get_spectrogram(audio)
spectrogram = tf.expand_dims(spectrogram, -1)
return spectrogram
def preprocess_dataset(files):
files_ds = tf.data.Dataset.from_tensor_slices(files)
output_ds = files_ds.map(get_waveform_and_label, num_parallel_calls=AUTOTUNE)
output_ds = output_ds.map(get_spectrogram_and_label_id, num_parallel_calls=AUTOTUNE)
print(output_ds)
return output_ds #return tuple (spetogram, label)
print("Loading model")
model = tf.keras.models.load_model('models/model_0.8_2021-10-04.h5')
CLASS_NAMES = pickle.loads(open("models/labels_0.8_2021-10-04.pickle", "rb").read())
data_dir = pathlib.Path('recordings')
if not data_dir.exists():
print("Unable to load recordings")
AUTOTUNE = tf.data.AUTOTUNE
#sample_file = data_dir/'kekeres/testB.wav'
sample_file = data_dir/'kekeres/kekeres_1633285520639.wav'
sample_ds = preprocess_dataset([str(sample_file)])
for spectrogram in sample_ds.batch(1):
prediction = model(spectrogram)
prediction_data = tf.nn.softmax(prediction, axis=1).numpy()
index = tf.argmax(prediction, axis=1)
print("class: "+str(CLASS_NAMES[np.array(index)]))
print("confidence: "+str(prediction_data[0][np.array(index)]))
【问题讨论】:
标签: python tensorflow audio tf.keras voice-recognition