【发布时间】:2019-07-20 06:44:41
【问题描述】:
我很难真的尝试在docker 中安装稳定的数据科学包配置。使用这种主流的相关工具应该会更容易。
以下是使用工作的Dockerfile,有点破解,从包核心中删除pandas并单独安装,指定@987654327 @,因为据称更高版本与 numpy 冲突。
FROM alpine:3.6
ENV PACKAGES="\
dumb-init \
musl \
libc6-compat \
linux-headers \
build-base \
bash \
git \
ca-certificates \
freetype \
libgfortran \
libgcc \
libstdc++ \
openblas \
tcl \
tk \
libssl1.0 \
"
ENV PYTHON_PACKAGES="\
numpy \
matplotlib \
scipy \
scikit-learn \
nltk \
"
RUN apk add --no-cache --virtual build-dependencies python3 \
&& apk add --virtual build-runtime \
build-base python3-dev openblas-dev freetype-dev pkgconfig gfortran \
&& ln -s /usr/include/locale.h /usr/include/xlocale.h \
&& python3 -m ensurepip \
&& rm -r /usr/lib/python*/ensurepip \
&& pip3 install --upgrade pip setuptools \
&& ln -sf /usr/bin/python3 /usr/bin/python \
&& ln -sf pip3 /usr/bin/pip \
&& rm -r /root/.cache \
&& pip install --no-cache-dir $PYTHON_PACKAGES \
&& pip3 install 'pandas<0.21.0' \ #<---------- PANDAS
&& apk del build-runtime \
&& apk add --no-cache --virtual build-dependencies $PACKAGES \
&& rm -rf /var/cache/apk/*
# set working directory
WORKDIR /usr/src/app
# add and install requirements
COPY ./requirements.txt /usr/src/app/requirements.txt # other than data science packages go here
RUN pip install -r requirements.txt
# add entrypoint.sh
COPY ./entrypoint.sh /usr/src/app/entrypoint.sh
RUN chmod +x /usr/src/app/entrypoint.sh
# add app
COPY . /usr/src/app
# run server
CMD ["/usr/src/app/entrypoint.sh"]
上面的配置曾经可以工作。 现在会发生什么,构建确实通过了,但是pandas 在导入时失败并出现以下错误:
ImportError: Missing required dependencies ['numpy']
自从安装了numpy 1.16.1,我不知道numpy pandas 正在尝试找到哪个...
有谁知道如何为此获得稳定的解决方案?
注意:一个解决方案也非常受欢迎,该解决方案包括从数据科学的交钥匙docker 图像中提取至少上述软件包到上述Dockerfile。 p>
编辑 1:
如果我按照 cmets 中的建议将数据包的安装移动到 requirements.txt,如下所示:
requirements.txt
(...)
numpy==1.16.1 # or numpy==1.16.0
scikit-learn==0.20.2
scipy==1.2.1
nltk==3.4
pandas==0.24.1 # or pandas== 0.23.4
matplotlib==3.0.2
(...)
和Dockerfile:
# add and install requirements
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip install -r requirements.txt
它在pandas 再次中断,抱怨numpy。
Collecting numpy==1.16.1 (from -r requirements.txt (line 61))
Downloading https://files.pythonhosted.org/packages/2b/26/07472b0de91851b6656cbc86e2f0d5d3a3128e7580f23295ef58b6862d6c/numpy-1.16.1.zip (5.1MB)
Collecting scikit-learn==0.20.2 (from -r requirements.txt (line 62))
Downloading https://files.pythonhosted.org/packages/49/0e/8312ac2d7f38537361b943c8cde4b16dadcc9389760bb855323b67bac091/scikit-learn-0.20.2.tar.gz (10.3MB)
Collecting scipy==1.2.1 (from -r requirements.txt (line 63))
Downloading https://files.pythonhosted.org/packages/a9/b4/5598a706697d1e2929eaf7fe68898ef4bea76e4950b9efbe1ef396b8813a/scipy-1.2.1.tar.gz (23.1MB)
Collecting nltk==3.4 (from -r requirements.txt (line 64))
Downloading https://files.pythonhosted.org/packages/6f/ed/9c755d357d33bc1931e157f537721efb5b88d2c583fe593cc09603076cc3/nltk-3.4.zip (1.4MB)
Collecting pandas==0.24.1 (from -r requirements.txt (line 65))
Downloading https://files.pythonhosted.org/packages/81/fd/b1f17f7dc914047cd1df9d6813b944ee446973baafe8106e4458bfb68884/pandas-0.24.1.tar.gz (11.8MB)
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 359, in get_provider
module = sys.modules[moduleOrReq]
KeyError: 'numpy'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-_e5z6o6_/pandas/setup.py", line 732, in <module>
ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
File "/tmp/pip-install-_e5z6o6_/pandas/setup.py", line 475, in maybe_cythonize
numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1144, in resource_filename
return get_provider(package_or_requirement).get_resource_filename(
File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 361, in get_provider
__import__(moduleOrReq)
ModuleNotFoundError: No module named 'numpy'
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-_e5z6o6_/pandas/
编辑 2:
这似乎是一个公开的pandas 问题。更多详情请参考:
“很遗憾,这意味着 requirements.txt 文件不足以设置安装了 pandas 的新环境(例如在 docker 容器中)”。
**ImportError**:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the multiarray numpy extension module failed. Most
likely you are trying to import a failed build of numpy.
Here is how to proceed:
- If you're working with a numpy git repository, try `git clean -xdf`
(removes all files not under version control) and rebuild numpy.
- If you are simply trying to use the numpy version that you have installed:
your installation is broken - please reinstall numpy.
- If you have already reinstalled and that did not fix the problem, then:
1. Check that you are using the Python you expect (you're using /usr/local/bin/python),
and that you have no directories in your PATH or PYTHONPATH that can
interfere with the Python and numpy versions you're trying to use.
2. If (1) looks fine, you can open a new issue at
https://github.com/numpy/numpy/issues. Please include details on:
- how you installed Python
- how you installed numpy
- your operating system
- whether or not you have multiple versions of Python installed
- if you built from source, your compiler versions and ideally a build log
编辑 3
requirements.txt ---> https://pastebin.com/0icnx0iu
编辑 4
截至 2020 年 1 月 12 日,已接受的解决方案开始不再起作用。
现在,构建中断不是在pandas,而是在scipy,而是在numpy 之后,同时构建scipy's 轮子。这是日志:
----------------------------------------
ERROR: Failed building wheel for scipy
Running setup.py clean for scipy
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3.6 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-s6nahssd/scipy/setup.py'"'"'; __file__='"'"'/tmp/pip-install-s6nahssd/scipy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' clean --all
cwd: /tmp/pip-install-s6nahssd/scipy
Complete output (9 lines):
`setup.py clean` is not supported, use one of the following instead:
- `git clean -xdf` (cleans all files)
- `git clean -Xdf` (cleans all versioned files, doesn't touch
files that aren't checked into the git repo)
Add `--force` to your command to use it anyway if you must (unsupported).
----------------------------------------
ERROR: Failed cleaning build dir for scipy
Successfully built numpy
Failed to build scipy
ERROR: Could not build wheels for scipy which use PEP 517 and cannot be installed directly
从错误看来,构建过程使用的是python3.6,而我使用的是FROM alpine:3.7。
完整日志在这里 -> https://pastebin.com/Tw4ubxSA
这是当前的 Dockerfile:
【问题讨论】:
-
您提到“指定
pandas<0.21.0,因为据称更高版本与numpy冲突”,您是否真的遇到过pandas 0.24.1和numpy之间的问题?自发布以来,我每天都在使用此版本,并且我没有遇到与numpy的任何冲突问题。 -
在上面的上下文中,如果我指向
Collecting pandas==0.24.1,我会得到错误:File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 346, in get_provider module = sys.modules[moduleOrReq] KeyError: 'numpy' -
嗯.. 你有没有试过把你的库放在
requirements.txt文件中,COPY文件到你的容器和RUN pip install -r requirements。这就是我通常在我的 docker 项目中安装 python 库的方式 -
试过了,没用。请参考我的编辑。
-
为什么要自己构建?你可以在
Dockerhub上找到大量已经在工作的用于数据科学应用程序的容器,例如 Anaconda 容器就足够了。我认为即使nltk默认情况下也在那里,所以你可以使用这样的“交钥匙”容器。
标签: python pandas numpy docker alpine