【问题标题】:Python IBM Watson Speech to Text API Convert Transcript to CSVPython IBM Watson Speech to Text API 将文本转换为 CSV
【发布时间】:2021-12-05 23:18:38
【问题描述】:

我在 Python 中使用 IBM Watson 语音转文本 API,并将 JSON 响应存储为嵌套字典。我可以使用pprint(data_response['results'][0]['alternatives'][0]['transcript']) 访问单个记录,但无法打印所有成绩单。我需要将整个成绩单转储到 .csv 中。我已经尝试使用 another post 中向我建议的相同格式使用生成器理解,使用 print(a["confidence"] for r in data_response["results"] for a in r["alternatives"]),但我一定不了解生成器理解的工作原理。

这是嵌套字典使用漂亮打印的样子:

{'result_index': 0,
 'results': [{'alternatives': [{'confidence': 0.99, 'transcript': 'hello '}],
              'final': True},
             {'alternatives': [{'confidence': 0.9,
                                'transcript': 'good morning any this is '}],
              'final': True},
             {'alternatives': [{'confidence': 0.59,
                                'transcript': "I'm on a recorded morning "
                                              '%HESITATION today start running '
                                              "yeah it's really good how are "
                                              "you %HESITATION it's one three "
                                              'six thank you so much for '
                                              'asking '}],
              'final': True},
             {'alternatives': [{'confidence': 0.87,
                                'transcript': 'I appreciate this opportunity '
                                              'to get together with you and '
                                              '%HESITATION you know learn more '
                                              'about you your interest in '}],
              'final': True},

编辑:这是我使用来自@SeaChange 的响应将 .pkl 文件列表转换为 .csv 文件的最终解决方案,这有助于仅导出嵌套字典的转录部分。我确信有更有效的方法来转换文件,但它对我的应用程序非常有用。

# set the input path
input_path = "00_data\Watson Responses"

# set the output path
output_path = "00_data\Watson Scripts"

# set the list of all files in the input path with a file ending of pkl
files = [f for f in glob.glob(input_path + "**/*.pkl", recursive=True)]

# open each pkl file, convert the list to a dataframe, and export to a csv
for file in files:
    base_name = os.path.basename(file)
    f_name, f_ext = os.path.splitext(base_name)
    pkl_file = open(join(dirname(__file__), input_path, base_name), 'rb')
    data_response = pickle.load(pkl_file)
    pkl_file.close()
    transcripts = [a["transcript"] for r in data_response["results"] for a in r["alternatives"]]
    dataframe = pd.DataFrame(transcripts)
    dataframe.to_csv(os.path.join(output_path, f'{f_name}.csv'), index = False, header = False)

【问题讨论】:

    标签: python json dictionary export-to-csv ibm-watson


    【解决方案1】:
    transcripts = [a["transcript"] for r in data_response["results"] for a in r["alternatives"]]
    

    这为您提供了所有成绩单的列表。那时,它仅取决于您希望如何格式化输出文件。如果您希望每个成绩单都在一个新行上,您可以使用 writelines。

    writelines

    【讨论】:

    • 这太完美了,非常感谢!我没有意识到需要这么小的改动才能完成这项工作。
    猜你喜欢
    • 2017-07-28
    • 1970-01-01
    • 2018-01-01
    • 2019-05-05
    • 1970-01-01
    • 1970-01-01
    • 2019-04-02
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多