【问题标题】:Spark scala not able to push data in Hive tableSpark scala 无法在 Hive 表中推送数据
【发布时间】:2018-04-11 14:58:23
【问题描述】:

我正在尝试在现有的 hive 表中推送数据,我已经在 hive 中创建了 orc 表,无法在 hive 中推送数据。如果我在 spark 控制台上复制粘贴但无法通过 spark-submit 运行,则此代码有效。

import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object TestCode {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("first example").setMaster("local")
    val sc = new SparkContext(conf)
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    for (i <- 0 to 100 - 1) {
    //  sample value but it replace with business logic. and try to push into table.for loop consider as business logic.
      var fstring = "fstring" + i
      var cmd = "cmd" + i
      var idpath = "idpath" + i
      import sqlContext.implicits._
      val sDF = Seq((fstring, cmd, idpath)).toDF("t_als_s_path", "t_als_s_cmd", "t_als_s_pd")
      sDF.write.insertInto("l_sequence");
      //sDF.write.format("orc").saveAsTable("l_sequence");
      println("write data ==> " + i)
    }
   }

给出错误。

 Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: l_sequence;
        at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:449)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:455)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:453)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:60)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:453)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:443)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
        at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
        at scala.collection.immutable.List.foldLeft(List.scala:84)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:65)
        at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:63)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:51)
        at org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:69)
        at org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:68)
        at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:78)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:76)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:83)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:83)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
        at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:259)
        at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:239)
        at com.hq.bds.Helloword$$anonfun$main$1.apply$mcVI$sp(Helloword.scala:16)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
        at com.hq.bds.Helloword$.main(Helloword.scala:10)
        at com.hq.bds.Helloword.main(Helloword.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

【问题讨论】:

  • 您是在 clouder VM 上还是在集群上运行。您正在处理的集群大小是多少。你能添加你的 spark-submit 命令吗
  • 如果您在本地环境中使用客户端模式工作,您通常需要通过 --files 选项将 hive.xml 文件发送到 spark-submit。例如,在 docker 安装中,我发送:--files '/etc/spark/conf.dist/hive-site.xml'
  • 感谢您的快速回复。我已经设置了 5 台机器的 hortonworks 2.4 集群设置。 spark-submit 命令是 /usr/hdp/current/spark2-client/bin/spark-submit --class com.qh.bds.Helloword /root/users/pushdata_poc.jar
  • 你能指导我怎么做吗..

标签: scala apache-spark hive


【解决方案1】:

您需要将 hive-site.xml 与 spark conf 链接或将 hive-site.xml 复制到 spark conf 目录中。火花不是 能够找到您的 hive 元存储(默认为 derby 数据库),因此我们必须将 hive-conf 链接到 spark conf 目录。

最后,要将 Spark SQL 连接到现有的 Hive 安装,您必须将 hive-site.xml 文件复制到 Spark 的配置目录 ($SPARK_HOME/conf)。如果你 如果没有现有的 Hive 安装,Spark SQL 仍将运行。

Sudo 到 root 用户,然后将 hive-site 复制到 spark conf 目录。

sudo -u root 
cp /etc/hive/conf/hive-site.xml /etc/spark/conf

【讨论】:

  • 尝试在您的 sql 查询中提供 {DB_NAME}.{TABLE_NAME} .. 希望这应该有效
  • 您确定该表存在于您的配置单元数据库中,您可以运行 val df = sqlContext.table("db_name.table_name") 看看它是否工作正常
  • 不,你能运行上面的命令并检查一下吗,我想检查你的 hive 元存储的 spark 连接
  • 我已经更新了代码并重新运行它给了我错误。线程“main”中的异常 org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException:找不到数据库'bds_db';
猜你喜欢
  • 2021-10-17
  • 1970-01-01
  • 2020-10-22
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-07-09
  • 2018-02-22
相关资源
最近更新 更多