AzureMl 管道：如何将 step1 的数据访问到 step2答案

【问题标题】：AzureMl pipeline: How to access data of step1 into step2AzureMl 管道：如何将 step1 的数据访问到 step2
【发布时间】：2021-02-15 11:54:13
【问题描述】：

我正在关注微软的article 来创建包含两个步骤的 azure ml 管道，并希望将 step1 写入的数据用于 step2。根据下面的文章，代码应提供 step1 写入用于 step2 的脚本的数据路径作为参数

datastore = workspace.datastores['my_adlsgen2']
step1_output_data = OutputFileDatasetConfig(name="processed_data", destination=(datastore, "mypath/{run-id}/{output-name}")).as_upload()

step1 = PythonScriptStep(
    name="generate_data",
    script_name="step1.py",
    runconfig = aml_run_config,
    arguments = ["--output_path", step1_output_data]
)

step2 = PythonScriptStep(
    name="read_pipeline_data",
    script_name="step2.py",
    compute_target=compute,
    runconfig = aml_run_config,
    arguments = ["--pd", step1_output_data.as_input]

)

pipeline = Pipeline(workspace=ws, steps=[step1, step2])

但是当我访问 step2.py 中的 pd 参数时，它提供了

">"

知道如何传递 step1 使用的 blob 存储位置以在 step2 中写入数据吗？

【问题讨论】：

您应该尝试按照以下笔记本进行操作，其中描述了步骤，您还将找到使用的底层 python 脚本，尤其是 train.py 脚本。 github.com/Azure/MachineLearningNotebooks/blob/master/…

标签： azureml azureml-python-sdk

【解决方案1】：

你可能会在这里找到你需要的东西：https://docs.microsoft.com/en-us/azure/machine-learning/how-to-move-data-in-out-of-pipelines。特别注意Read OutputFileDatasetConfig as inputs to non-initial steps部分：

# get adls gen 2 datastore already registered with the workspace
datastore = workspace.datastores['my_adlsgen2']
step1_output_data = OutputFileDatasetConfig(name="processed_data", 
destination=(datastore, "mypath/{run-id}/{output-name}")).as_upload()

step1 = PythonScriptStep(
    name="generate_data",
    script_name="step1.py",
    runconfig = aml_run_config,
    arguments = ["--output_path", step1_output_data]
    )

step2 = PythonScriptStep(
    name="read_pipeline_data",
    script_name="step2.py",
    compute_target=compute,
    runconfig = aml_run_config,
    arguments = ["--pd", step1_output_data.as_input()]
    )

pipeline = Pipeline(workspace=ws, steps=[step1, step2])

您的错误可能是 OutputFileDatasetConfig 有一个方法 as_input() 但没有属性。

【讨论】：