继续在 MLflow 中停止运行

【问题标题】：Continue stopped run in MLflow继续在 MLflow 中停止运行
【发布时间】：2021-09-17 21:21:05
【问题描述】：

我们在 AWS Spot 实例上运行我们的实验。有时实验会停止，我们更愿意继续记录到相同的运行。如何设置活动运行的 run-id？

类似这样的伪代码（不工作）：

if new:
    mlflow.start_run(experiment_id=1, run_name=x)
else:
    mlflow.set_run(run_id)

【问题讨论】：

标签： python mlflow

【解决方案1】：

您可以将run_id直接传递给start_run：

mlflow.start_run(experiment_id=1,
                 run_name=x,
                 run_id=<run_id_of_interrupted_run> # pass None to start a new run
                 )

当然，您必须为此存储 run_id。你可以通过run.info.run_id获得它

【讨论】：