【发布时间】:2016-02-13 11:36:30
【问题描述】:
我正在尝试使用 pig(版本 0.12)解析 xml,但出现以下错误:
解析失败:Pig 脚本解析失败: 无法生成逻辑计划。嵌套异常:org.apache.pig.backend.executionengine.ExecException:错误 1070:无法使用导入解析 org.apache.pig.piggybank.evaluation.xml.XPath:[,java.lang.,org.apache.pig。 builtin., org.apache.pig.impl.builtin.]
我的 XML 文件如下:
<CATALOG>
<BOOK>
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
<BOOK>
<TITLE>Programming Pig</TITLE>
<AUTHOR>Alan Gates</AUTHOR>
<COUNTRY>USA</COUNTRY>
<COMPANY>Horton Works</COMPANY>
<PRICE>30.90</PRICE>
<YEAR>2013</YEAR>
</BOOK>
</CATALOG>
练习来自:http://hadoopgeek.com/apache-pig-xml-parsing-xpath/
下面是脚本:
REGISTER piggybank.jar
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();
A = LOAD '/hadoop_books.xml' using org.apache.pig.piggybank.storage.XMLLoader('BOOK') as (x:chararray);
B = FOREACH A GENERATE XPath(x, 'BOOK/AUTHOR'), XPath(x, 'BOOK/PRICE');
dump B;
请帮忙
I have kept .xml file in hadoop root directory
【问题讨论】:
-
你必须创建一个目录名称 xmls 然后添加听到 'hadoop_books.xml' 文件然后尝试运行。
标签: xml apache-pig