【发布时间】:2021-04-20 08:37:21
【问题描述】:
我正在使用 Python 进行 NLP,我已将音频文件转换为文本,然后找到语音中每个单词的时间偏移量,然后将单词存储在 wordlist 中加上时间在 timelist 中。
我有三个列表,第一个列表名为 strlist,第二个名为 wordlist,第三个名为 timelist
strlist 包含短语让我们说
strlist = ["in", "the", "family"]
单词列表包含段落或让我们说句子
wordlist = ["there", "are", "few", "things", "to", " be", "in", "the", "family", "means"]
timelist 包含一些针对wordlist 中存储的每个单词的时间值让我们假设
timelist=[2,3,4,5,7,4,8,9,5,3]
我想知道strlist 的短语(由几个词组成)是否出现在wordlist 中。如果它存在,那么我想根据这些词检查 timelist 中存储的时间值。
from pathlib import Path
import io
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('proven-
mystery-310205-f04fb2ab3d69.json')
str='in my family'
strlist = list(str.split(" "))
timelist=[]
wordlist=[]
strlist.append("")
for i in strlist:
print(i)
speech_file = Path("C:/Users/Tani/PycharmProjects/pythonProject/t.wav")
print("Start")
from google.cloud import speech_v1 as speech
print("checking credentials")
client = speech.SpeechClient(credentials=credentials)
print("Checked")
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
print("audio file read")
audio = speech.RecognitionAudio(content=content)
print("config start")
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
language_code='en-US',
audio_channel_count=2,
enable_separate_recognition_per_channel=True,
enable_word_time_offsets=True)
print("Recognizing:")
response = client.recognize(config=config,audio=audio)
print("Recognized")
for result in response.results:
alternative = result.alternatives[0]
#print('Transcript: {}'.format(alternative.transcript))
for word_info in alternative.words:
word = word_info.word
start_time = word_info.start_time
end_time = word_info.end_time
wordlist.append(word)
timelist.append(start_time.seconds)
print(str)
for a, b in zip(wordlist,timelist):
print('Word: {}, time: {}'.format(
a,
b))
print("findout time")
for s in strlist:
if s in wordlist:
position = wordlist.index(s)
time_s = timelist[position]
print(f"Word: '{s}', Time: {time_s}")
【问题讨论】:
标签: python nlp timestamp nltk speech