【发布时间】:2014-12-18 14:45:44
【问题描述】:
每行有 5 列,这 5 列通常用逗号分隔
1 column is name
2nd column is date_of_purchase
3rd column is product
4th column is mode of payment
5th column is total_amount
希望您了解其中包含哪些数据
surender,2014-03-09,TV,OFFLINE,20000
surender,2014-01-01,Mobile,ONLINE,18000
Raja,2014-09-21,Laptop,ONLINE,30000
Surender,2014-10-12,Laptop,ONLINE,40000
Raja,2014-FEB-11,MusicSystem,ONLINE,2000
Kumar,2014-07-09,Ipod,OFFLINE,4000
Kumar,2014-06-08,TV,ONLINE,20000
Raja,2014-11-07,SPeakers,OFFLINE,8000
Kumar,2014-10-18,Laptop,ONLINE,30000
我需要的是我想看看每个人通过在线模式和离线模式花了多少钱
基本上我需要减速器的输出应该如下所示
surender OFFLINE 20000
surender ONLINE 58000
Raja OFFLINE 8000
Raja ONLINE 32000
Kumar OFFLINE 4000
Kumar ONLINE 50000
最终的输出应该是这样的:
surender 20000 58000
Raja 8000 32000
Kumar 4000 50000
你可以给我一个 hive 或 pig 查询或者一个 mapreduce 程序
【问题讨论】:
标签: hadoop mapreduce hive apache-pig hadoop-streaming