如何在 Java 中分析 .wav 文件的音量、音高和速度？答案

【问题标题】：How do I analyze volume, pitch and speed of a .wav file in Java?如何在 Java 中分析 .wav 文件的音量、音高和速度？
【发布时间】：2011-12-26 19:21:19
【问题描述】：

所以，我正在尝试重新制作 Vib 功能区：http://www.youtube.com/watch?v=ehdymXc0epY 输入将是一个 .wav 文件，我一点也不知道如何分析它并为音量和音高创建阈值，这会产生不同的障碍——我被指出傅里叶变换，我不明白.有人可以向我指出适用于这种情况的波形分析课程并让我知道如何开始吗？我一直无法获得诸如 AudioSurf 和音乐可视化器之类的源代码。

您可能会问，为什么选择 java？我正在学习 Java 入门课程，所以没有其他语言可以工作。

【问题讨论】：

你需要一个特殊的库。你可以试试这个：blog.datasingularity.com/?p=53
谢谢 - 非常有用的链接，虽然我当时无法使用它。

标签： java wav analysis waveform

【解决方案1】：

您可以编写一个 Praat 脚本（Praat 可用于下载 here）来生成包含您需要的信息的输出文件，然后使用您的 java 程序读取该 txt 文件。

可能还有外部库，就像@Gareth 说的那样。

【讨论】：

【解决方案2】：

我最终为此使用了Sound Viewer Tool，尽管使用的是另一种语言（Python）和另一个类项目。如果在svt.py中添加以下内容：

def processWav(filename, channel):
    """
    filename: path to a wav file
    Channel: 1 for left, 2 for right
    Returns centroids, frequencies, volumes
    """
    #open file
    audio_file = audiolab.sndfile(filename, 'read')
    #should be length of audiofile in seconds * 60. will fix this later
    import contextlib
    import wave
    with contextlib.closing(wave.open(filename, 'r')) as f:
        frames = f.getnframes()
        rate = f.getframerate()
        duration = frames / float(rate)
    duration *= 30 #30 data points for every second of audio yay
    duration = int(duration) #can only return an integer number of frames so yeah
    #print duration
    #Not really samples per pixel but I'll let that slide
    samples_per_pixel = audio_file.get_nframes() / float(duration)
    #some rule says this frequency has to be half of the sample rate
    nyquist_freq = (audio_file.get_samplerate() / 2) + 0.0
    #fft_size stays 4096
    processor = AudioProcessor(audio_file, 2048, channel, numpy.hanning)

    centroids = []
    frequencies = []
    volumes = []

    for x in range(duration):
        seek_point = int(x * samples_per_pixel)
        next_seek_point = int((x + 1) * samples_per_pixel)
        (spectral_centroid, db_spectrum) = processor.spectral_centroid(seek_point)
        peaks = processor.peaks(seek_point, next_seek_point)      
        centroids.append(spectral_centroid)
        frequencies.append(db_spectrum)
        volumes.append(peaks)

    #print "Centroids:" + str(centroids)
    #print "Frequencies:" + str(frequencies)
    #print "Volumes:" + str(volumes)

    #convert volumes[] from peaks to actual volumes
    for i in range(len(volumes)):
        volumes[i] = abs(volumes[i][0]) + abs(volumes[i][1])
    #round frequencies to save resources
    for i in range(len(frequencies)):
        for j in range(len(frequencies[i])):
            frequencies[i][j] = round(frequencies[i][j], 4)
    return centroids, frequencies, volumes

使用 wav 文件可以轻松完成分析。质心代表音乐的音色 - 频率的加权平均值，它们表示任何时间点的整体亮度。

第一个答案here 对我理解 FFT/信号处理/数字声音表示有很大帮助。

【讨论】：