【问题标题】:Pyarrow fs.HadoopFileSytem reports unable to load libhdfs.soPyarrow fs.HadoopFileSytem 报告无法加载 libhdfs.so
【发布时间】:2022-04-19 03:49:59
【问题描述】:

我正在尝试将 pyarrow 文件系统接口与 HDFS 一起使用。我在调用 fs.HadoopFileSystem 构造函数时收到 libhdfs.so not found 错误,即使 libhdfs.so 显然位于指定位置。

from pyarrow import fs
hfs = fs.HadoopFileSystem(host="10.10.0.167", port=9870)

OSError: Unable to load libhdfs: /hadoop-3.3.1/lib/native/libhdfs.so: cannot open shared object file: No such file or directory

我尝试了不同的 python 和 pyarrow 版本并设置了 ARROW_LIBHDFS_DIR。为了测试,我在 linuxmint 上使用以下 dockerfile。

FROM openjdk:11

RUN apt-get update &&\
  apt-get install wget -y

RUN wget -nv https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1-aarch64.tar.gz &&\
  tar -xf hadoop-3.3.1-aarch64.tar.gz

ENV PATH=/miniconda/bin:${PATH}
RUN wget -nv https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh &&\
  bash miniconda.sh -b -p /miniconda &&\
  conda init 

RUN conda install -c conda-forge python=3.9.6
RUN conda install -c conda-forge pyarrow=4.0.1

ENV JAVA_HOME=/usr/local/openjdk-11
ENV HADOOP_HOME=/hadoop-3.3.1  

RUN  printf 'from pyarrow import fs\nhfs = fs.HadoopFileSystem(host="10.10.0.167", port=9870)\n' > test_arrow.py

# 'python test_arrow.py' fails with ... 
# OSError: Unable to load libhdfs: /hadoop-3.3.1/lib/native/libhdfs.so: cannot open shared object file: No such file or directory
RUN python test_arrow.py || true

CMD ["/bin/bash"]

【问题讨论】:

    标签: hdfs pyarrow libhdfs


    【解决方案1】:

    我为 pyarrow fs hadoopfilesystem 客户端创建了一个 docker 文件。 需要安装 HDFS 才能使用 libhdfs.so 文件。

        RUN mkdir -p /data/hadoop
        RUN apt-get -q update
        RUN apt-get install software-properties-common -y
        RUN add-apt-repository "deb http://deb.debian.org/debian/ sid main"
        RUN apt-get -q update
        RUN apt-get install openjdk-8-jdk -y
        RUN apt-get clean
        RUN rm -rf /var/lib/apt/lists/*
        RUN wget "https://dlcdn.apache.org/hadoop/common/hadoop-3.3.2/hadoop-3.3.2.tar.gz" -O hadoop-3.3.2.tar.gz
        RUN tar xzf hadoop-3.3.2.tar.gz
        ENV HADOOP_HOME=/app/hadoop-3.3.2
        ENV HADOOP_INSTALL=$HADOOP_HOME
        ENV HADOOP_MAPRED_HOME=$HADOOP_HOME
        ENV HADOOP_COMMON_HOME=$HADOOP_HOME
        ENV HADOOP_HDFS_HOME=$HADOOP_HOME
        ENV YARN_HOME=$HADOOP_HOME
        ENV HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
        ENV PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
        ENV HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/nativ"
        ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
        ENV CLASSPATH="$HADOOP_HOME/bin/hadoop classpath --glob"
        ENV ARROW_LIBHDFS_DIR=$HADOOP_HOME/lib/native
        ADD pyarrow-app.py /app/
        CMD [ "python3" "-u" "/app/pyarrow-app.py.py"]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-10-17
      • 1970-01-01
      • 2023-02-09
      • 2020-03-18
      • 1970-01-01
      • 2014-08-20
      相关资源
      最近更新 更多