【发布时间】:2018-10-10 02:35:25
【问题描述】:
我正在尝试从 Spark Sql 将数据插入 Hive 外部表。 我通过以下命令创建了 hive 外部表
CREATE EXTERNAL TABLE tab1 ( col1 type,col2 type ,col3 type) CLUSTERED BY (col1,col2) SORTED BY (col1) INTO 8 BUCKETS STORED AS PARQUET
在我的 spark 工作中,我编写了以下代码 数据集 df = session.read().option("header","true").csv(csvInput);
df.repartition(numBuckets, somecol)
.write()
.format("parquet")
.bucketBy(numBuckets,col1,col2)
.sortBy(col1)
.saveAsTable(hiveTableName);
每次运行此代码时,我都会收到以下异常
org.apache.spark.sql.AnalysisException: Table `tab1` already exists.;
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:408)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393)
at somepackage.Parquet_Read_WriteNew.writeToParquetHiveMetastore(Parquet_Read_WriteNew.java:100)
【问题讨论】:
标签: apache-spark hive