VM cloudera - 用户 cloudera 和权限？答案

【问题标题】：VM cloudera - user cloudera and permissions?VM cloudera - 用户 cloudera 和权限？
【发布时间】：2014-02-06 14:43:27
【问题描述】：

我下载并安装了 VM Cloudera 4.4 以使用 Hadoop。我已经在一个平台上为我的工作建立了一个集群，所以我对 hadoop 的工作原理有所了解。所以我认为我的问题来自于我对 linux 及其用户和组的误解。

使用 Hive：

我尝试使用 shell 创建一个配置单元表，它可以工作。我在 /user/hive/warehouse/test 中有一张表，女巫属于 cloudera 组的用户 cloudera。

我在 hdfs 中有一些数据文件 (.txt)：/user/cloudera（用户：cloudera 和组：hive），我将它们加载到我的配置单元表中：

LOAD DATA INPATH '/user/cloudera/*.txt' INTO TABLE test;

这是我得到的：

hive> LOAD DATA INPATH '/user/cloudera/jeuDeTest/*.txt' INTO TABLE test;
Loading data to table default.test
chgrp: changing ownership of '/user/hive/warehouse/test/_log24310.txt': User does not belong to hive
chgrp: changing ownership of '/user/hive/warehouse/test/_log24311.txt': User does not belong to hive
Table default.test stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 10161843, raw_data_size: 0]
OK
Time taken: 2.472 seconds

我从未收到过此类错误消息，但文件已被移动。如果我尝试SELECT *，则没有结果。

使用 HBase：

我对 HBase 也有一些困难。我可以创建一个表，但是当我使用 importTSV 时：

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv 
-Dimporttsv.columns=HBASE_ROW_KEY,cf:nl,ch:nt,cf:ti,cf:ip,cf:cr,cf:am,cf:op,cf:mr,cf:ct 
'-Dimporttsv.separator=|' testhbase -Dimporttsv.skip.bad.lines=false  
/user/cloudera/jeuDeTest/*.txt

我有这个错误：

ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) 
cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: 
hdfs://localhost.localdomain:8020/user/cloudera/jeuDeTest/_logGeneral_C_24310_SO.txt
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist:     
hdfs://localhost.localdomain:8020/user/cloudera/jeuDeTest/_logGeneral_C_24310_SO.txt

我认为这个问题是由于权限造成的，但我不知道如何有权执行请求，有什么更好的方法来做到这一点。（在我工作的平台上，我是root，我没有这些困难，但我不明白它是如何工作的）

感谢您阅读我。

天使

我尝试将我的 cloudera 用户添加到组 hive。我在加载过程中没有错误，但在选择时我总是没有结果。

hive> LOAD DATA INPATH '/user/cloudera/jeuDeTest/*.txt' INTO TABLE test;                     
Loading data to table default.test
Table default.test stats: [num_partitions: 0, num_files: 10, num_rows: 0, total_size: 10161843,   raw_data_size: 0]
OK
Time taken: 0.486 seconds
hive> select * from test limit 20;
OK
Time taken: 0.303 seconds

【问题讨论】：

Cloudera 虚拟机存在各种问题，并没有得到很好的支持。但我也在努力做到最好。我也确实看到“用户不属于配置单元”错误。

标签： hadoop permissions hive cloudera

【解决方案1】：

我在权限方面遇到了同样的问题 -> chgrp: 更改 '/user/hive/warehouse/test/_log24310.txt' 的所有权：用户不属于 hive。

使用命令将名为 cloudera 的现有用户添加到名为 hive 的现有组中： usermod -a -G hive cloudera
重启系统
使用加载命令，然后执行 select * from table_name -> 没有显示数据。
从 table_name 执行 select count(*) 并开始 MapReduce 作业。
已执行 select * from table，现在结果已正确返回。
使用 impala-shell 命令打开了一个 impala shell。
执行了 select * from table_name 并且没有返回任何结果。
执行的命令使 impala-shell 中的元数据无效
执行命令刷新table_name
执行的命令显示表
执行命令 select * from table_name，现在结果同时显示在 impala-shell 和 hive shell 中。

【讨论】：