Python IBM Watson Speech to Text API 将文本转换为 CSV答案

【问题标题】：Python IBM Watson Speech to Text API Convert Transcript to CSVPython IBM Watson Speech to Text API 将文本转换为 CSV
【发布时间】：2021-12-05 23:18:38
【问题描述】：

我在 Python 中使用 IBM Watson 语音转文本 API，并将 JSON 响应存储为嵌套字典。我可以使用pprint(data_response['results'][0]['alternatives'][0]['transcript']) 访问单个记录，但无法打印所有成绩单。我需要将整个成绩单转储到 .csv 中。我已经尝试使用 another post 中向我建议的相同格式使用生成器理解，使用 print(a["confidence"] for r in data_response["results"] for a in r["alternatives"])，但我一定不了解生成器理解的工作原理。

这是嵌套字典使用漂亮打印的样子：

{'result_index': 0,
 'results': [{'alternatives': [{'confidence': 0.99, 'transcript': 'hello '}],
              'final': True},
             {'alternatives': [{'confidence': 0.9,
                                'transcript': 'good morning any this is '}],
              'final': True},
             {'alternatives': [{'confidence': 0.59,
                                'transcript': "I'm on a recorded morning "
                                              '%HESITATION today start running '
                                              "yeah it's really good how are "
                                              "you %HESITATION it's one three "
                                              'six thank you so much for '
                                              'asking '}],
              'final': True},
             {'alternatives': [{'confidence': 0.87,
                                'transcript': 'I appreciate this opportunity '
                                              'to get together with you and '
                                              '%HESITATION you know learn more '
                                              'about you your interest in '}],
              'final': True},

编辑：这是我使用来自@SeaChange 的响应将 .pkl 文件列表转换为 .csv 文件的最终解决方案，这有助于仅导出嵌套字典的转录部分。我确信有更有效的方法来转换文件，但它对我的应用程序非常有用。

# set the input path
input_path = "00_data\Watson Responses"

# set the output path
output_path = "00_data\Watson Scripts"

# set the list of all files in the input path with a file ending of pkl
files = [f for f in glob.glob(input_path + "**/*.pkl", recursive=True)]

# open each pkl file, convert the list to a dataframe, and export to a csv
for file in files:
    base_name = os.path.basename(file)
    f_name, f_ext = os.path.splitext(base_name)
    pkl_file = open(join(dirname(__file__), input_path, base_name), 'rb')
    data_response = pickle.load(pkl_file)
    pkl_file.close()
    transcripts = [a["transcript"] for r in data_response["results"] for a in r["alternatives"]]
    dataframe = pd.DataFrame(transcripts)
    dataframe.to_csv(os.path.join(output_path, f'{f_name}.csv'), index = False, header = False)

【问题讨论】：

标签： python json dictionary export-to-csv ibm-watson

【解决方案1】：

transcripts = [a["transcript"] for r in data_response["results"] for a in r["alternatives"]]

这为您提供了所有成绩单的列表。那时，它仅取决于您希望如何格式化输出文件。如果您希望每个成绩单都在一个新行上，您可以使用 writelines。

writelines

【讨论】：

这太完美了，非常感谢！我没有意识到需要这么小的改动才能完成这项工作。