在 Kedro 中，如何在管道中获取中间数据集？答案

【问题标题】：In Kedro, how to pick up intermediate dataset in a pipeline?在 Kedro 中，如何在管道中获取中间数据集？
【发布时间】：2020-08-27 14:47:39
【问题描述】：

我正在处理我的管道并在 jupyter notebook 上手动测试它。

这是我的情况。

我想从中提取example_train和example_valid，所以我是这样写的。

context.pipeline.to_outputs("example_train", "example_valid")

并将另一个管道传递给 SequencialRunner，我得到了它们。

我还想要total_steps，所以我像这样更改了行。

context.pipeline.to_outputs("example_train", "example_valid", "total_steps")

但是，结果不包含exampe_train。是的，我知道example_train 不是这个修改过的管道的输出，所以它没有包含。

有没有办法在这种情况下获取中间数据集？

【问题讨论】：

标签： kedro

【解决方案1】：

您可以在catalog.yml 的数据目录中定义这些数据集，并定义它们的存储位置。

例如：

example_train:
  type: pandas.CSVDataSet
  filepath: data/02_intermediate/example_train.csv

【讨论】：