在 azure ml 部署环境中导入自定义 python 模块答案

【问题标题】：import custom python module in azure ml deployment environment在 azure ml 部署环境中导入自定义 python 模块
【发布时间】：2020-03-29 06:27:17
【问题描述】：

我有一个 sklearn k-means 模型。我正在训练模型并将其保存在 pickle 文件中，以便稍后使用 azure ml 库进行部署。我正在训练的模型使用名为 MultiColumnLabelEncoder 的自定义特征编码器。管道模型定义如下：

# Pipeline
kmeans = KMeans(n_clusters=3, random_state=0)
pipe = Pipeline([
("encoder", MultiColumnLabelEncoder()),
('k-means', kmeans),
])
#Training the pipeline
model = pipe.fit(visitors_df)
prediction = model.predict(visitors_df)
#save the model in pickle/joblib format
filename = 'k_means_model.pkl'
joblib.dump(model, filename)

模型保存工作正常。部署步骤与此链接中的步骤相同：

https://notebooks.azure.com/azureml/projects/azureml-getting-started/html/how-to-use-azureml/deploy-to-cloud/model-register-and-deploy.ipynb

但是部署总是失败并出现此错误：

  File "/var/azureml-server/create_app.py", line 3, in <module>
    from app import main
  File "/var/azureml-server/app.py", line 27, in <module>
    import main as user_main
  File "/var/azureml-app/main.py", line 19, in <module>
    driver_module_spec.loader.exec_module(driver_module)
  File "/structure/azureml-app/score.py", line 22, in <module>
    importlib.import_module("multilabelencoder")
  File "/azureml-envs/azureml_b707e8c15a41fd316cf6c660941cf3d5/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'multilabelencoder'

我了解 pickle/joblib 在取消自定义函数 MultiLabelEncoder 时存在一些问题。这就是为什么我在一个单独的 python 脚本（我也执行过）中定义了这个类。我在训练 Python 脚本、部署脚本和评分 Python 文件 (score.py) 中调用了这个自定义函数。 score.py 文件中的导入不成功。所以我的问题是如何将自定义 python 模块导入 azure ml 部署环境？

提前谢谢你。

编辑：这是我的 .yml 文件

name: project_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2

- pip:
  - multilabelencoder==1.0.4
  - scikit-learn
  - azureml-defaults==1.0.74.*
  - pandas
channels:
- conda-forge

【问题讨论】：

能否分享一下环境文件（.yml）进行检查。

标签： python pickle azure-machine-learning-studio azure-machine-learning-service

【解决方案1】：

我面临同样的问题，尝试部署一个依赖于我自己的一些脚本的模型并收到错误消息：

 ModuleNotFoundError: No module named 'my-own-module-name'

在MS documentation 中找到了这个“私人轮子文件”解决方案，它可以工作。与上述解决方案的不同之处在于，现在我不需要将脚本发布到 pip。我想很多人可能会面临同样的情况，由于某种原因你不能或不想发布你的脚本。相反，您自己的 Wheel 文件保存在您自己的 Blob 存储下。

按照文档，我执行了以下步骤，它对我有用。现在我可以在我自己的脚本中部署具有依赖关系的模型了。

将自己的模型依赖的脚本打包成wheel文件，wheel文件保存在本地。

"your_path/your-wheel-file-name.whl"
按照MS documentation 中“私人轮子文件”解决方案中的说明进行操作。下面是对我有用的代码。

from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies

whl_url = Environment.add_private_pip_wheel(workspace=ws,file_path = "your_pathpath/your-wheel-file-name.whl")

myenv = CondaDependencies()
myenv.add_pip_package("scikit-learn==0.22.1")
myenv.add_pip_package("azureml-defaults")
myenv.add_pip_package(whl_url)

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

我的环境文件现在看起来像：

name: project_environment
dependencies:
  # The python interpreter version.

  # Currently Azure ML only supports 3.5.2 and later.

- python=3.6.2

- pip:
  - scikit-learn==0.22.1
  - azureml-defaults
  - https://myworkspaceid.blob.core/azureml/Environment/azureml-private-packages/my-wheel-file-name.whl
channels:
- conda-forge

我是 Azure ml 的新手。边做边学，与社区交流。这个解决方案对我来说很好，希望它有所帮助。

【讨论】：

【解决方案2】：

其实解决的办法是把我自定义的类MultiColumnLabelEncoder作为pip包导入（可以通过pip install multilllabelencoder==1.0.5找到）。然后我将 pip 包传递给 .yml 文件或 azure ml 环境的 InferenceConfig 中。在 score.py 文件中，我按如下方式导入了类：

from multilabelencoder import multilabelencoder
def init():
    global model

    # Call the custom encoder to be used dfor unpickling the model
    encoder = multilabelencoder.MultiColumnLabelEncoder() 
    # Get the path where the deployed model can be found.
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'k_means_model_45.pkl')
    model = joblib.load(model_path)

然后部署成功。更重要的一件事是我必须在训练管道中使用与此处相同的 pip 包（多标签编码器）：

from multilabelencoder import multilabelencoder 
pipe = Pipeline([
    ("encoder", multilabelencoder.MultiColumnLabelEncoder(columns)),
    ('k-means', kmeans),
])
#Training the pipeline
trainedModel = pipe.fit(df)

【讨论】：