无法从给定路径读取架构：hdfs://...avsc答案

【问题标题】：Unable to read schema from given path: hdfs://...avsc无法从给定路径读取架构：hdfs://...avsc
【发布时间】：2016-07-10 10:20:22
【问题描述】：

我尝试通过以下步骤创建配置单元表：

使用 sqoop 将数据加载到 hdfs（完成）
sqoop 还创建了一个 avsc 文件，我将它上传到 hdfs
在 hive 中，我想使用以下语句创建一个表：

命令：

CREATE EXTERNAL TABLE kontoauszug
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
STORED AS 
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs:///user/tki/KONTOAUSZUG'
TBLPROPERTIES ('avro.schema.url'='hdfs://m1.hdp2/user/tki/KONTOAUSZUG.avsc');

我收到以下错误：

FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask. 
java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
Encountered AvroSerdeException determining schema. 
Returning signal schema to indicate problem: Unable to read schema from given path: hdfs://m1.hdp2/user/tki/KONTOAUSZUG.avsc)

这是否意味着未找到 KONTOAUSZUG.avsc？我交叉检查了它是否可用。

它的内容是：

[hadoop@m1 hive]$ cat KONTOAUSZUG.avsc 
{
  "type" : "record",
  "name" : "KONTOAUSZUG",
  "doc" : "Sqoop import of KONTOAUSZUG",
  "fields" : [ {
    "name" : "FK_PROCESS_ID_INS",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "FK_PROCESS_ID_INS",
    "sqlType" : "2"
  }, {
    "name" : "FK_SOURCE_ID",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "FK_SOURCE_ID",
    "sqlType" : "2"
  }, {
    "name" : "SRC_STM_ID",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "SRC_STM_ID",
    "sqlType" : "2"
  }, {
    "name" : "FK_PROCESS_ID_UPD",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "FK_PROCESS_ID_UPD",
    "sqlType" : "2"
  }, {
    "name" : "BUCHUNGSDATUM",
    "type" : [ "null", "long" ],
    "default" : null,
    "columnName" : "BUCHUNGSDATUM",
    "sqlType" : "93"
  }, {
    "name" : "BUCHUNGSTEXT",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "BUCHUNGSTEXT",
    "sqlType" : "12"
  }, {
    "name" : "SOLL",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "SOLL",
    "sqlType" : "2"
  }, {
    "name" : "HABEN",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "HABEN",
    "sqlType" : "2"
  }, {
    "name" : "FK_KONTO_ID",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "FK_KONTO_ID",
    "sqlType" : "2"
  }, {
    "name" : "EINGABE_MANUELL_F",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "EINGABE_MANUELL_F",
    "sqlType" : "2"
  } ],
  "tableName" : "KONTOAUSZUG"
}

【问题讨论】：

您的 CREATE 语句对我来说看起来不错 - 您确定可以列出 .avsc 文件吗？ - hadoop fs -ls /user/tki/KONTOAUSZUG.avsc

标签： hadoop hive hdfs sqoop

【解决方案1】：

在CREATE 表语句中，您使用三个斜杠：hdfs:/// 而不是 hdfs://：

(...)'hdfs:///user/tki/KONTOAUSZUG' TBLPROPERTIES (...)

另外，如果你是从namenode服务器获取schema，我认为你应该在主机url后面写端口。

'avro.schema.url'='hdfs://m1.hdp2:端口/user/tki/KONTOAUSZUG.avsc'

查看this 了解详情。还有，this answer about hdfs ports。

【讨论】：

既没有减少斜线也没有指定端口解决了这个问题。我必须补充一点，我确实在另一个（预配置的 cloudera）VM 上成功运行了相同的语句。还有其他想法吗？
尝试使用 avro.schema.literal 而不是 url - 示例 - cloudera.com/documentation/archive/cdh/4-x/4-3-0/… - 或者 - 这对错误没有意义，但您可能会注意到一些不同的行为：CREATE EXTERNAL TABLE kontoauszug STORED AS AVRO LOCATION 'hdfs:///user/tki/KONTOAUSZUG' TBLPROPERTIES ('avro.schema.url'='hdfs://m1.hdp2/user/tki/KONTOAUSZUG.avsc');

【解决方案2】：

这是一个访问错误。请检查 avsc 目录以获得正确的权限并重试 hdfs://m1.hdp2/user/tki/KONTOAUSZUG.avsc

【讨论】：