【问题标题】:Updating pandas via conda on Sagemaker EXTREMELY slow在 Sagemaker 上通过 conda 更新 pandas 极其缓慢
【发布时间】:2021-01-15 21:43:38
【问题描述】:

我的工作环境中托管的 Sagemaker 默认 python 环境已过时,因此必须更新其 conda 环境。但是,这非常慢(15-30 分钟),我想找到一种更快的方法来获得工作环境

我更新如下:

!conda update pandas fsspec --yes

这给出了以下输出,关键问题是不一致的启动环境(如何?),如下所示 failed with repodata from current_repodata.json, will retry with next repodata source. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done

输出:

Collecting package metadata (current_repodata.json): done
Solving environment: / 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/linux-64::pandas==1.0.1=py36h0573a6f_0
  - defaults/noarch::jupyterlab==1.2.6=pyhf63ae98_0
  - defaults/linux-64::scikit-learn==0.22.1=py36hd81dba3_0
  - defaults/linux-64::python-language-server==0.31.7=py36_0
  - defaults/linux-64::bkcharts==0.2=py36_0
  - defaults/linux-64::nb_conda==2.2.1=py36_0
  - defaults/noarch::numpydoc==0.9.2=py_0
  - defaults/linux-64::pytest-arraydiff==0.3=py36h39e3cac_0
  - defaults/linux-64::bottleneck==1.3.2=py36heb32a55_0
  - defaults/linux-64::pywavelets==1.1.1=py36h7b6447c_0
  - defaults/noarch::pytest-astropy==0.8.0=py_0
  - defaults/linux-64::numexpr==2.7.1=py36h423224d_0
  - defaults/noarch::anaconda-project==0.8.4=py_0
  - defaults/noarch::boto3==1.9.162=py_0
  - defaults/linux-64::s3transfer==0.2.1=py36_0
  - defaults/linux-64::nbconvert==5.6.1=py36_0
  - defaults/linux-64::h5py==2.10.0=py36h7918eee_0
  - defaults/linux-64::bokeh==1.4.0=py36_0
  - defaults/noarch::jupyterlab_server==1.0.6=py_0
  - defaults/linux-64::numpy-base==1.18.1=py36hde5b4d6_1
  - defaults/noarch::botocore==1.12.189=py_0
  - defaults/linux-64::jupyter==1.0.0=py36_7
  - defaults/linux-64::astropy==4.0=py36h7b6447c_0
  - defaults/linux-64::patsy==0.5.1=py36_0
  - defaults/linux-64::scikit-image==0.16.2=py36h0573a6f_0
  - defaults/linux-64::matplotlib-base==3.1.3=py36hef1b27d_0
  - defaults/linux-64::imageio==2.6.1=py36_0
  - defaults/linux-64::pytables==3.6.1=py36h71ec239_0
  - defaults/linux-64::nb_conda_kernels==2.2.4=py36_0
  - defaults/linux-64::mkl_fft==1.0.15=py36ha843d7b_0
  - defaults/linux-64::statsmodels==0.11.0=py36h7b6447c_0
  - defaults/linux-64::spyder==4.0.1=py36_0
  - defaults/noarch::seaborn==0.10.0=py_0
  - defaults/linux-64::requests==2.22.0=py36_1
  - defaults/linux-64::numba==0.48.0=py36h0573a6f_0
  - defaults/linux-64::scipy==1.4.1=py36h0b6359f_0
  - defaults/noarch::pytest-doctestplus==0.5.0=py_0
  - defaults/linux-64::mkl_random==1.1.0=py36hd6b4f25_0
  - defaults/noarch::dask==2.11.0=py_0
  - defaults/noarch::ipywidgets==7.5.1=py_0
  - defaults/linux-64::widgetsnbextension==3.5.1=py36_0
  - defaults/noarch::s3fs==0.4.2=py_0
  - defaults/linux-64::notebook==6.0.3=py36_0
  - defaults/linux-64::matplotlib==3.1.3=py36_0
  - defaults/linux-64::anaconda-client==1.7.2=py36_0
  - defaults/linux-64::numpy==1.18.1=py36h4f9e942_0
failed with repodata from current_repodata.json, will retry with next repodata source.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: | 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/noarch::jupyterlab==1.2.6=pyhf63ae98_0
  - defaults/linux-64::python-language-server==0.31.7=py36_0
  - defaults/linux-64::nb_conda==2.2.1=py36_0
  - defaults/noarch::numpydoc==0.9.2=py_0
  - defaults/noarch::anaconda-project==0.8.4=py_0
  - defaults/noarch::boto3==1.9.162=py_0
  - defaults/linux-64::s3transfer==0.2.1=py36_0
  - defaults/linux-64::nbconvert==5.6.1=py36_0
  - defaults/linux-64::bokeh==1.4.0=py36_0
  - defaults/noarch::jupyterlab_server==1.0.6=py_0
  - defaults/noarch::botocore==1.12.189=py_0
  - defaults/linux-64::jupyter==1.0.0=py36_7
  - defaults/linux-64::scikit-image==0.16.2=py36h0573a6f_0
  - defaults/linux-64::imageio==2.6.1=py36_0
  - defaults/linux-64::nb_conda_kernels==2.2.4=py36_0
  - defaults/linux-64::spyder==4.0.1=py36_0
  - defaults/linux-64::requests==2.22.0=py36_1
  - defaults/noarch::dask==2.11.0=py_0
  - defaults/noarch::ipywidgets==7.5.1=py_0
  - defaults/linux-64::widgetsnbextension==3.5.1=py36_0
  - defaults/noarch::s3fs==0.4.2=py_0
  - defaults/linux-64::notebook==6.0.3=py36_0
  - defaults/linux-64::anaconda-client==1.7.2=py36_0
done


==> WARNING: A newer version of conda exists. <==
  current version: 4.8.4
  latest version: 4.9.2

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /home/ec2-user/anaconda3/envs/python3

  added / updated specs:
    - fsspec
    - pandas
    - s3fs


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    astroid-2.4.2              |   py36h9f0ad1d_1         297 KB  conda-forge
    certifi-2020.12.5          |   py36h5fab9bb_1         143 KB  conda-forge
    docutils-0.16              |   py36h5fab9bb_3         738 KB  conda-forge
    pandas-1.1.4               |   py36hd87012b_0        10.5 MB  conda-forge
    pillow-7.1.2               |   py36hb39fc2d_0         604 KB
    pylint-2.6.0               |   py36h9f0ad1d_1         446 KB  conda-forge
    sphinx-3.4.3               |     pyhd8ed1ab_0         1.5 MB  conda-forge
    toml-0.10.2                |     pyhd8ed1ab_0          18 KB  conda-forge
    urllib3-1.25.11            |             py_0          93 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        14.3 MB

The following NEW packages will be INSTALLED:

  astroid            conda-forge/linux-64::astroid-2.4.2-py36h9f0ad1d_1
  bleach             conda-forge/noarch::bleach-3.2.1-pyh9f0ad1d_0
  brotlipy           conda-forge/linux-64::brotlipy-0.7.0-py36he6145b8_1001
  docutils           conda-forge/linux-64::docutils-0.16-py36h5fab9bb_3
  pillow             pkgs/main/linux-64::pillow-7.1.2-py36hb39fc2d_0
  pylint             conda-forge/linux-64::pylint-2.6.0-py36h9f0ad1d_1
  sphinx             conda-forge/noarch::sphinx-3.4.3-pyhd8ed1ab_0
  toml               conda-forge/noarch::toml-0.10.2-pyhd8ed1ab_0
  urllib3            conda-forge/noarch::urllib3-1.25.11-py_0

The following packages will be UPDATED:

  ca-certificates                      2020.11.8-ha878542_0 --> 2020.12.5-ha878542_0
  certifi                          2020.11.8-py36h5fab9bb_0 --> 2020.12.5-py36h5fab9bb_1
  fsspec                       pkgs/main::fsspec-0.6.2-py_0 --> conda-forge::fsspec-0.8.5-pyhd8ed1ab_0
  pandas             pkgs/main::pandas-1.0.1-py36h0573a6f_0 --> conda-forge::pandas-1.1.4-py36hd87012b_0



Downloading and Extracting Packages
pillow-7.1.2         | 604 KB    | ##################################### | 100% 
astroid-2.4.2        | 297 KB    | ##################################### | 100% 
pylint-2.6.0         | 446 KB    | ##################################### | 100% 
sphinx-3.4.3         | 1.5 MB    | ##################################### | 100% 
pandas-1.1.4         | 10.5 MB   | ##################################### | 100% 
docutils-0.16        | 738 KB    | ##################################### | 100% 
urllib3-1.25.11      | 93 KB     | ##################################### | 100% 
certifi-2020.12.5    | 143 KB    | ##################################### | 100% 
toml-0.10.2          | 18 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

很高兴接受有关如何使用现代软件包尽快在 sagemaker 中启动 python 笔记本的任何建议。

其他尝试的解决方案:

  • 快速的pip install -U 因依赖问题而无法工作—— 笔记本中的本地环境将尝试将 pandas 指向 过时的 fsspec,它会崩溃
  • 按照 AWS 文档将我的 conda 请求添加到启动脚本不起作用,因为启动脚本超时(我认为是 10 分钟?)所以 15 分钟以上的 conda update 进程只是确保 sagemaker 实例不能开始

【问题讨论】:

    标签: python pandas conda amazon-sagemaker


    【解决方案1】:

    此问题的原因是因为 conda 进行了依赖项检查。它试图找到与所有软件包兼容的软件包版本,同时 pip 安装所需的软件包及其可能导致不一致的依赖项。 [1]

    这个问题有两种解决方法,

    1. 使用所需的包创建自定义环境,并创建要从 Sagemaker 笔记本中使用的内核。
    2. 使用 --no-deps 选项pip install pandas==&lt;version&gt; --no-deps。您可能需要使用 -U 选项。

    回顾一下,我建议要么创建一个自定义环境,要么使用 pip 安装包及其所有依赖项,并使用选项 --no-deps。您可能需要在笔记本运行时尝试这两种方法,然后应用到生命周期配置脚本。

    【讨论】:

      猜你喜欢
      • 2018-10-16
      • 2020-12-03
      • 2011-11-18
      • 1970-01-01
      • 1970-01-01
      • 2019-04-27
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多