【问题标题】:use ls command as input for shell script使用 ls 命令作为 shell 脚本的输入
【发布时间】:2024-07-24 11:55:02
【问题描述】:

我的文件在 Hadoop 文件系统中;我需要对它们中的每一个运行 phoenix bulk import。 现在我的shell脚本是这样的:

test.sh:

HADOOP_CLASSPATH=/usr/lib/hbase/lib/hbase-protocol-1.1.2.jar:/etc/hbase/conf hadoop jar  /usr/lib/phoenix/lib/phoenix/phoenix-1.2.0-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table NETWORK_HEALTH --input $1

hdfs dfs -ls /tmp/hbase-temp/tmp 的输出是:

-rw-r--r--   2 root hadoop  405003334 2016-04-06 15:28 /tmp/hbase-temp/tmp/nodeHealth20160210-20160211.txt
-rw-r--r--   2 root hadoop 1373330318 2016-04-06 15:28 /tmp/hbase-temp/tmp/nodeHealth20160211-20160212.txt
-rw-r--r--   2 root hadoop 1303613420 2016-04-06 15:28 /tmp/hbase-temp/tmp/nodeHealth20160212-20160213.txt
-rw-r--r--   2 root hadoop 1239413840 2016-04-06 15:28 /tmp/hbase-temp/tmp/nodeHealth20160214-20160215.txt
-rw-r--r--   2 root hadoop 1342998954 2016-04-06 15:28 /tmp/hbase-temp/tmp/nodeHealth20160215-20160216.txt
-rw-r--r--   2 root hadoop 1248737317 2016-04-06 15:29 /tmp/hbase-temp/tmp/nodeHealth20160216-20160217.txt
-rw-r--r--   2 root hadoop 1146305115 2016-04-06 15:29 /tmp/hbase-temp/tmp/nodeHealth20160217-20160218.txt
-rw-r--r--   2 root hadoop 1357281689 2016-04-06 15:29 /tmp/hbase-temp/tmp/nodeHealth20160218-20160219.txt
-rw-r--r--   2 root hadoop 1113842508 2016-04-06 15:29 /tmp/hbase-temp/tmp/nodeHealth20160219-20160220.txt
-rw-r--r--   2 root hadoop 1193977572 2016-04-06 15:29 /tmp/hbase-temp/tmp/nodeHealth20160220-20160221.txt
-rw-r--r--   2 root hadoop 1005786711 2016-04-06 15:30 /tmp/hbase-temp/tmp/nodeHealth20160221-20160222.txt
-rw-r--r--   2 root hadoop 1159168545 2016-04-06 15:30 /tmp/hbase-temp/tmp/nodeHealth20160222-20160223.txt
-rw-r--r--   2 root hadoop 1163804889 2016-04-06 15:30 /tmp/hbase-temp/tmp/nodeHealth20160223-20160224.txt
-rw-r--r--   2 root hadoop 1048950098 2016-04-06 15:30 /tmp/hbase-temp/tmp/nodeHealth20160224-20160225.txt
-rw-r--r--   2 root hadoop 1251527803 2016-04-06 15:30 /tmp/hbase-temp/tmp/nodeHealth20160225-20160226.txt
-rw-r--r--   2 root hadoop 1288661897 2016-04-06 15:31 /tmp/hbase-temp/tmp/nodeHealth20160226-20160227.txt
-rw-r--r--   2 root hadoop 1226833581 2016-04-06 15:31 /tmp/hbase-temp/tmp/nodeHealth20160227-20160228.txt
-rw-r--r--   2 root hadoop 1245110612 2016-04-06 15:31 /tmp/hbase-temp/tmp/nodeHealth20160228-20160229.txt
-rw-r--r--   2 root hadoop 1321007542 2016-04-06 15:31 /tmp/hbase-temp/tmp/nodeHealth20160229-20160230.txt
-rw-r--r--   2 root hadoop 1301010760 2016-04-06 15:31 /tmp/hbase-temp/tmp/nodeHealth20160301-20160302.txt
-rw-r--r--   2 root hadoop 1121192190 2016-04-06 15:32 /tmp/hbase-temp/tmp/nodeHealth20160302-20160303.txt
-rw-r--r--   2 root hadoop 1296388727 2016-04-06 15:32 /tmp/hbase-temp/tmp/nodeHealth20160303-20160304.txt
-rw-r--r--   2 root hadoop 1280975648 2016-04-06 15:32 /tmp/hbase-temp/tmp/nodeHealth20160304-20160305.txt
-rw-r--r--   2 root hadoop 1264795738 2016-04-06 15:32 /tmp/hbase-temp/tmp/nodeHealth20160305-20160306.txt
-rw-r--r--   2 root hadoop 1248570281 2016-04-06 15:32 /tmp/hbase-temp/tmp/nodeHealth20160306-20160307.txt
-rw-r--r--   2 root hadoop 1335704328 2016-04-06 15:33 /tmp/hbase-temp/tmp/nodeHealth20160307-20160308.txt
-rw-r--r--   2 root hadoop 1246153114 2016-04-06 15:33 /tmp/hbase-temp/tmp/nodeHealth20160308-20160309.txt
-rw-r--r--   2 root hadoop 1251409839 2016-04-06 15:33 /tmp/hbase-temp/tmp/nodeHealth20160309-20160310.txt
-rw-r--r--   2 root hadoop 1120439077 2016-04-06 15:33 /tmp/hbase-temp/tmp/nodeHealth20160310-20160311.txt
-rw-r--r--   2 root hadoop 1151595336 2016-04-06 15:33 /tmp/hbase-temp/tmp/nodeHealth20160311-20160312.txt
-rw-r--r--   2 root hadoop 1304537932 2016-04-06 15:34 /tmp/hbase-temp/tmp/nodeHealth20160312-20160313.txt
-rw-r--r--   2 root hadoop 1065020972 2016-04-06 15:34 /tmp/hbase-temp/tmp/nodeHealth20160313-20160314.txt
-rw-r--r--   2 root hadoop 1237905144 2016-04-06 15:34 /tmp/hbase-temp/tmp/nodeHealth20160314-20160315.txt
-rw-r--r--   2 root hadoop 1038185956 2016-04-06 15:34 /tmp/hbase-temp/tmp/nodeHealth20160315-20160316.txt
-rw-r--r--   2 root hadoop 1216670016 2016-04-06 15:35 /tmp/hbase-temp/tmp/nodeHealth20160316-20160317.txt
-rw-r--r--   2 root hadoop 1139180542 2016-04-06 15:35 /tmp/hbase-temp/tmp/nodeHealth20160317-20160318.txt
-rw-r--r--   2 root hadoop 1052672363 2016-04-06 15:35 /tmp/hbase-temp/tmp/nodeHealth20160318-20160319.txt
-rw-r--r--   2 root hadoop  892045686 2016-04-06 15:35 /tmp/hbase-temp/tmp/nodeHealth20160319-20160320.txt

当我在命令下运行时,它只适用于第一行:

hdfs dfs -ls /tmp/hbase-temp/tmp | awk '{打印 $8}' | xargs sh test.sh

如何解决它为我在 ls 输出中的每个文件运行 test.sh 的问题?

【问题讨论】:

标签: linux bash shell hadoop awk


【解决方案1】:

您可以使用进程替换:

while read -r _ _ _ _ _ _ _ var8 _; do
   bash ./test.sh "$var8"
done < <(hdfs dfs -ls /tmp/hbase-temp/tmp)

如果您必须使用xargs,请使用-I 选项:

hdfs dfs -ls /tmp/hbase-temp/tmp | awk '{print $8}' | xargs -I {} sh test.sh '{}'`

【讨论】:

  • 它为 meline 3 引发错误:意外标记 ` 附近的语法错误
  • 确保你使用bash而不是sh来执行它
【解决方案2】:

-n 1 添加到您的命令中

hdfs dfs -ls /tmp/hbase-temp/tmp | awk '{print $8}' | xargs -n 1 sh test.sh

这是手册页文档:

 -n number
         Set the maximum number of arguments taken from standard input for each invocation of utility.  An invocation of utility will use less than number standard input arguments if the number of bytes accu-
         mulated (see the -s option) exceeds the specified size or there are fewer than number arguments remaining for the last invocation of utility.  The current default value for number is 5000.

我的 test.sh 文件中有 echo $1,input.txt 中有 3 个示例行。测试结果是:

$awk '{print $8}' input.txt |xargs  -n1 sh test.sh                                                                                                                                 Wed  6 Apr 16:31:14 2016
/tmp/hbase-temp/tmp/nodeHealth20160210-20160211.txt
/tmp/hbase-temp/tmp/nodeHealth20160211-20160212.txt
/tmp/hbase-temp/tmp/nodeHealth20160212-20160213.txt

【讨论】:

  • 您的解决方案也有效,但我接受了@anubhava,因为他首先回答了它。