【问题标题】:Spark Streaming and Phoenix Kerberos issueSpark Streaming 和 Phoenix Kerberos 问题
【发布时间】:2017-04-20 01:25:24
【问题描述】:

很抱歉这个问题,因为这个问题已经在 SO 上提出了很多次,但是在浏览了每个相关帖子后,我仍然无法找到解决我的问题的方法。 我在 HDP 2.4.2 上的 Kerberos env 上使用 Spark Streaming (1.6.1) 和 Phoenix (4.4) 在尝试从 HBase 读取或写入时低于异常。即使从 spark-submit 跳过 key-ph.conf 文件,我也会遇到同样的问题。

我查看了下面的帖子,该帖子与我的问题相同,但我仍然无法找到解决问题的方法:

https://community.hortonworks.com/questions/56848/spark-cant-connect-to-secure-phoenix.html

Spark can't connect to secure phoenix

下面是我的 Spark 提交命令。

spark-submit \
--verbose \
--master yarn-cluster \
--num-executors 2  \
--executor-memory 8g \
--executor-cores 4 \
--conf spark.driver.memory=1024m  \
--files key-ph.conf#key-ph.conf,user.headless.keytab#user.headless.keytab,/etc/hbase/2.4.2.0-258/0/hbase-site.xml \
--jars /usr/hdp/2.4.2.0-258/hbase/lib/hbase-common-1.1.2.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/hbase/lib/hbase-client-1.1.2.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/hbase/lib/hbase-server-1.1.2.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/hbase/lib/hbase-protocol-1.1.2.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/hbase/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/2.4.2.0-258/hbase/lib/guava-12.0.1.jar,/usr/hdp/2.4.2.0-258/phoenix/lib/phoenix-core-4.4.0.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/phoenix/phoenix-4.4.0.2.4.2.0-258-client-spark.jar \
--driver-java-options "-Djava.security.auth.login.config=./key-ph.conf -Dhttp.proxyHost=proxy-host -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxy-host -Dhttps.proxyPort=8080 -Dlog4j.configuration=file:/home/user/spark-log4j/log4j-phoenix-driver.properties" \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key-ph.conf -Dlog4j.configuration=file:/home/user/spark-log4j/log4j-phoenix-executor.properties" \
--class com.spark.demo.SampleInsert /home/user/test-ph.jar tableName ZK_IP:2181:/hbase-secure:user@CLIENT.LAN:/home/user/user.headless.keytab

火花代码:

 demoArrDataFrame.write
        .format("org.apache.phoenix.spark")
        .options(Map("table" -> tableName.toUpperCase,
          "zkUrl" -> "ZK_IP:2181:/hbase-secure:user@FORSYS.LAN:/home/user/user.headless.keytab"))
        .mode(SaveMode.Overwrite)
        .save

16/12/05 16:11:36 WARN AbstractRpcClient: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
16/12/05 16:11:36 ERROR AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
    at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:611)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:156)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:737)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:734)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:887)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:856)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1200)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
    at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58152)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1571)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1509)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1531)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1560)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1711)
    at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
    at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4083)
    at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:528)
    at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:550)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:810)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1174)
    at org.apache.phoenix.query.DelegateConnectionQueryServices.createTable(DelegateConnectionQueryServices.java:112)
    at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:1974)
    at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:770)
    at org.apache.phoenix.compile.CreateTableCompiler$2.execute(CreateTableCompiler.java:186)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:305)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:297)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:295)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1244)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1850)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1819)
    at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1819)
    at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:180)
    at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:132)
    at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:151)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:99)
    at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:82)
    at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:70)
    at org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getUpsertColumnMetadataList(PhoenixConfigurationUtil.java:232)
    at org.apache.phoenix.spark.DataFrameFunctions$$anonfun$2.apply(DataFrameFunctions.scala:45)
    at org.apache.phoenix.spark.DataFrameFunctions$$anonfun$2.apply(DataFrameFunctions.scala:41)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:717)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:717)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
    at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
    at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
    at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
    at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
    ... 62 more

【问题讨论】:

    标签: hadoop hbase spark-streaming kerberos phoenix


    【解决方案1】:

    我可以通过执行以下步骤来解决此问题:

    1) 将所需的 hbase、phoenix jar 传递到 spark 额外的类路径选项中:

    --conf "spark.executor.extraClassPath=/usr/hdp/2.4.2.0-258/hbase/lib/hbase-common-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-client-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-server-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-protocol-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.4.2.0-258/hbase/lib/guava-12.0.1.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" \
    
    --conf "spark.driver.extraClassPath=/usr/hdp/2.4.2.0-258/hbase/lib/hbase-common-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-client-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-server-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-protocol-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.4.2.0-258/hbase/lib/guava-12.0.1.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" \
    

    2) spark 额外 java 选项中的有效 keytab 和 jaas conf:

    --conf "spark.driver.extraJavaOptions=-XX:+UseG1GC -Djava.security.auth.login.config=./kafka_jaas.conf -Dhttp.proxyHost=PROXY.IP -Dhttp.proxyPort=8080 -Dhttps.proxyHost=PROXY.IP2 -Dhttps.proxyPort=8080" \
    
    --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -Djava.security.auth.login.config=./kafka_jaas.conf" \
    

    以kafka-jaas.conf文件为例:

    KafkaServer {
       com.sun.security.auth.module.Krb5LoginModule required
       useKeyTab=true
       keyTab="/etc/security/keytabs/kafka.service.keytab"
       storeKey=true
       useTicketCache=false
       serviceName="kafka"
       principal="kafka/IP@REALM.LAN";
    };
    KafkaClient {
       com.sun.security.auth.module.Krb5LoginModule required
       useTicketCache=true
       renewTicket=true
       serviceName="kafka";
    };
    Client {
       com.sun.security.auth.module.Krb5LoginModule required
       useKeyTab=true
       keyTab="/etc/security/keytabs/kafka.service.keytab"
       storeKey=true
       useTicketCache=false
       serviceName="zookeeper"
       principal="kafka/IP@REALM.LAN";
    };
    

    3) 配置 --files spark 选项

    --files kafka_jaas.conf#kafka_jaas.conf,user.headless.keytab#user.headless.keytab,/etc/hbase/conf/hbase-site.xml#hbase-site.xml \
    

    如果您仍然无法连接到安全凤凰,请按照以下步骤操作。请在 SPARK_CLASSPATH 下设置,然后运行/执行 spark-submit 命令。

    export SPARK_CLASSPATH=/usr/hdp/2.4.2.0-258/hbase/lib/hbase-common-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-client-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-server-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-protocol-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.4.2.0-258/hbase/lib/guava-12.0.1.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar
    

    【讨论】:

      【解决方案2】:

      我很确定您的问题已经解决,但我相信它可以帮助仍然面临同样问题的人。

      Spark documentation 说:

      长时间运行的应用程序 如果长时间运行的应用程序的运行时间超过最大委托令牌生命周期,它们可能会遇到问题 在需要访问的服务中配置。

      Spark 支持为这些自动创建新令牌 在 YARN 模式下运行时的应用程序。 Kerberos 凭据需要 通过 spark-submit 命令提供给 Spark 应用程序, 使用 --principal 和 --keytab 参数

      提供的 keytab 将被复制到运行 Application Master 通过 Hadoop 分布式缓存。为此原因, 强烈建议 YARN 和 HDFS 都使用 至少是加密。

      Kerberos 登录将使用提供的定期更新 凭据,并将创建受支持的新委托令牌。

      以下是一个有效的 spark-submit 示例:

          spark-submit -v \
          --principal ${keytabPrincipal} \
          --keytab ${keytabPath} \
          --files ${configDir}/log4j-bdfsap.properties#log4j-bdfsap.properties,${configDir}/jaas.conf#jaas.conf,hdfs:///apps/spark/lib/spark-streaming-kafka-assembly_2.10-1.5.2.2.3.4.7-4.jar#spark-streaming-kafka-assembly_2.10-1.5.2.2.3.4.7-4.jar,hdfs:///apps/hive/conf/hive-site.xml#hive-site.xml,hdfs:///apps/spark/lib/elasticsearch-spark_2.10-2.4.0.jar#elasticsearch-spark_2.10-2.4.0.jar  \
          --driver-java-options "-Djava.security.auth.login.config=jaas.conf -Dlog4j.configuration=log4j-bdfsap.properties" \
       --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf -Dlog4j.configuration=log4j.properties" \
          --conf "spark.yarn.maxAppAttempts=4" \
          --conf "spark.yarn.am.attemptFailuresValidityInterval=1h" \
          --conf "spark.yarn.max.executor.failures=8" \
          --conf "spark.yarn.executor.failuresValidityInterval=1h" \
          --conf "spark.task.maxFailures=8" \
          --conf "spark.hadoop.fs.hdfs.impl.disable.cache=true" \
          --jars hdfs:///apps/spark/lib/spark-streaming-kafka-assembly_2.10-1.5.2.2.3.4.7-4.jar,hdfs:///apps/spark/lib/elasticsearch-spark_2.10-2.4.0.jar \
          --name ${APP_NAME} \
          --class ${APP_CLASS} \
          --master yarn-cluster \
          --driver-cores ${DRIVER_CORES} \
          --driver-memory ${DRIVER_MEMORY} \
          --num-executors ${NUM_EXECUTORS} \
          --executor-memory ${EXECUTOR_MEMORY} \
          --executor-cores ${EXECUTOR_CORES} \
          --queue ${queueNameSparkStreaming} \
          ${APP_LIB_HDFS_PATH} ${APP_CONF_HDFS_PATH}/${LOAD_CONF_FILE}
      

      jaas.conf:

      KafkaClient {
        com.sun.security.auth.module.Krb5LoginModule required
        useKeyTab=true
        storeKey=true
        keyTab="./user.keytab"
        useTicketCache=false
        serviceName="kafka"
        principal="user@BRUBLES";
      };
      

      【讨论】:

        猜你喜欢
        • 2016-11-08
        • 2018-10-11
        • 2017-07-02
        • 1970-01-01
        • 1970-01-01
        • 2017-06-22
        • 2015-03-10
        • 1970-01-01
        相关资源
        最近更新 更多