【发布时间】:2012-12-11 06:51:51
【问题描述】:
我有一个加载数据的表格如下:
create table xyzlogTable (dateC string , hours string, minutes string, seconds string, TimeTaken string, Method string, UriQuery string, ProtocolStatus string) row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties( "input.regex" = "(\\S+)\\t(\\d+):(\\d+):(\\d+)\\t(\\S+)\\t(\\S+)\\t(\\S+)\\t(\\S+)", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s") stored as textfile;
load data local inpath '/home/hadoop/hive/xyxlogData/' into table xyxlogTable;
发现总行数超过 300 万。有些查询工作正常,有些则进入无限循环。
在看到 select, group by 查询需要很长时间,有时甚至没有返回结果后,决定进行分区。
但是以下两个语句都失败了:
create table xyzlogTable (datenonQuery string , hours string, minutes string, seconds string, TimeTaken string, Method string, UriQuery string, ProtocolStatus string) partitioned by (dateC string);
FAILED:元数据错误:AlreadyExistsException(消息:表 xyzlogTable 已存在) FAILED:执行错误,从 org.apache.hadoop.hive.ql.exec.DDLTask 返回代码 1
Alter table xyzlogTable (datenonQuery string , hours string, minutes string, seconds string, TimeTaken string, Method string, UriQuery string, ProtocolStatus string) partitioned by (dateC string);
FAILED: Parse Error: line 1:12 cannot identify input 'xyzlogTable' in alter table statement
知道问题出在哪里!
【问题讨论】: