Python：改变音频文件的音高答案

【问题标题】：Python: Change Pitch of Audio FilePython：改变音频文件的音高
【发布时间】：2012-01-20 00:43:47
【问题描述】：

这是我在堆栈上的第一篇文章。到目前为止，这个站点非常有帮助，但我是一个新手，需要对我的问题进行清楚的解释，这与 Python 中的音高转换音频有关。我安装了当前模块：numpy、scipy、pygame 和 scikits “samplerate” api。

我的目标是制作一个立体声文件，并以尽可能少的步骤以不同的音高播放它。目前，我使用 pygame.sndarray 将文件加载到数组中，然后使用 scikits.samplerate.resample 应用采样率转换，然后使用 pygame 将输出转换回声音对象以进行播放。问题是我的扬声器发出垃圾音频。当然，我错过了几个步骤（除了对数学和音频一无所知）。

谢谢。

import time, numpy, pygame.mixer, pygame.sndarray
from scikits.samplerate import resample

pygame.mixer.init(44100,-16,2,4096)

# choose a file and make a sound object
sound_file = "tone.wav"
sound = pygame.mixer.Sound(sound_file)

# load the sound into an array
snd_array = pygame.sndarray.array(sound)

# resample. args: (target array, ratio, mode), outputs ratio * target array.
# this outputs a bunch of garbage and I don't know why.
snd_resample = resample(snd_array, 1.5, "sinc_fastest")

# take the resampled array, make it an object and stop playing after 2 seconds.
snd_out = pygame.sndarray.make_sound(snd_resample)
snd_out.play()
time.sleep(2)

【问题讨论】：

标签： python scipy pygame audio-processing

【解决方案1】：

您的问题是 pygame 与 numpy.int16 数组一起使用，但对 resample 的调用返回 numpy.float32 数组：

>>> snd_array.dtype
dtype('int16')
>>> snd_resample.dtype
dtype('float32')

您可以使用astype 将resample 结果转换为numpy.int16：

>>> snd_resample = resample(snd_array, 1.5, "sinc_fastest").astype(snd_array.dtype)

通过此修改，您的 python 脚本可以以较低的音高和较低的速度很好地播放 tone.wav 文件。

【讨论】：

天哪，我不知道该怎么感谢你。不用等，我愿意，如果你愿意接受我的提议，我可以通过 PayPal 向你汇款。我有无数个小时在寻找解决方案。这太棒了。
很高兴看到你喜欢我的回答 :) 你不需要给我任何东西，你的问题很有趣，我也从中学到了一些东西！
是否可以将修改后的 pygame.micer.Sound 对象保存为声音文件而不是播放？
我会使用 SciPy io.wavefile 模块：docs.scipy.org/doc/scipy/reference/io.html

【解决方案2】：

您最好的选择可能是使用 python audiere。

这是一个链接，我用它来做同样的事情，非常简单，只需阅读所有文档。

http://audiere.sourceforge.net/home.php

【讨论】：

谢谢，Audiere 是我的第一选择，但我无法在 make 中不出错。我对这些东西不太了解，所以鉴于我有限的技能，我不得不从事什么工作。
这看起来不错，它适用于 python 2.7 吗？它似乎适用于 python 2.2

【解决方案3】：

scikits.samplerate.resample 很可能“认为”您的音频是 16 位立体声以外的另一种格式。查看 scikits.samplerate 上的文档，了解如何在阵列中选择正确的音频格式 - 如果它对 16 位音频进行重新采样，则将其视为 8 位垃圾。

【讨论】：

【解决方案4】：

来自scikits.samplerate.resample 文档：

如果输入的秩为 1，则使用所有数据，并假定来自单声道信号。如果 rank 为 2，则将假定列数为通道数。

所以我认为您需要做的是这样的事情，以将立体声数据以它期望的格式传递给resample：

snd_array = snd_array.reshape((-1,2))

snd_resample = resample(snd_array, 1.5, "sinc_fastest")

snd_resample = snd_resample.reshape(-1) # Flatten it out again

【讨论】：

谢谢。我尝试了您的建议，但输出相同。