【发布时间】:2015-02-27 02:08:54
【问题描述】:
所以我使用以下文件作为输入: https://svn.apache.org/repos/asf/pig/trunk/tutorial/data/excite-small.log
我现在的代码是
-- FileName: excite-small.log
log = LOAD 'excite-small.log' AS (user, timestamp, query);
grpd = GROUP log BY user;
cntd = FOREACH grpd GENERATE group, COUNT(log);
STORE cntd INTO 'output'
我使用http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-pig-launch.html 中提到的步骤在 EMR 上运行此作业
**我设置了以下参数**
1. For Script Location: s3://mybucket/test.pig
2. For Input Location: s3://mybucket/excite-small.log
3. For Output Location: s3://mybucket/
4. Arguments: Blank
当我运行这个作业时,我收到一个错误 Input path does not exist。我认为这与REGISTER 有关,但我不太确定。有人可以建议我做错了吗?
【问题讨论】:
标签: hadoop apache-pig emr amazon-emr