为什么用不同的参数多次调用同一个函数会更慢答案

【问题标题】：Why calling the same function many times with different arguments is slower为什么用不同的参数多次调用同一个函数会更慢
【发布时间】：2017-01-19 04:47:43
【问题描述】：

我已经建立了一个简单的 bash 脚本，它从数千个单词的列表中生成 4 个单词的随机密码。现在我不确定它对于我个人使用是否真的安全或有效，如果您考虑任何改进，请告诉我。但这不是重点。看看 ->

所以当我在笔记本电脑上运行它时，输入和输出如下所示：

time sh genpass
astrology cringe tingling massager

real    0m0.319s
user    0m0.267s
sys     0m0.077s

第二次：

$ time sh genpass
prankish askew siren fritter

real    0m0.318s
user    0m0.266s
sys     0m0.077s

有时会很有趣。

不管怎样，这是脚本：

# EDITABLES ###########################################
target="/path/to/my/wordList.txt" 
# END EDITABLES #######################################

getWordList() {
  case $1 in
    "verb")  mawk '/ing$|ed$|en$/ {print $2}' $target ;;
    "adjective")  mawk '/y$|ish$/ {print $2}' $target ;;
    "noun")  mawk '!/ing$|ed$|en$|y$|ish$/ {print $2}' $target ;; 
    *) printf "%s" "'${1}' is an invalid argument." && echo && exit 1
  esac
}

pickRandomLineNumber() {
  # Get the list in an array
  declare -a list_a=("${!1}")
  # How many items in the list
  local length="${#list_a[@]}"
  # Generate a random number between 1 and the number of items in the list
  local number=$RANDOM 
  let "number %= $length"
  # Print the word at random line
  printf "%s\n" ${list_a[@]} | mawk -v line=$number 'NR==line {print}' 
}

read -ra verbList <<< $( getWordList verb )
verb=$(pickRandomLineNumber verbList[@])

read -ra adjectiveList <<< $( getWordList adjective )
adjective=$(pickRandomLineNumber adjectiveList[@])

read -ra nounList <<< $( getWordList noun )
noun1=$(pickRandomLineNumber nounList[@])
noun2=$(pickRandomLineNumber nounList[@])

printf "%s %s %s %s\n" "${adjective}" "${noun1}" "${verb}" "${noun2}"

看看我必须在哪里为每种类型的单词创建一个数组？ 3 种类型，3 个数组。好吧，我考虑过在函数中获取该代码，因此我只需要调用该函数 4 次，每个单词对应我的 4 个单词，并使用不同的参数。我真的以为它会更快。

这里是代码更改：

# EDITABLES ###########################################
target="/path/to/my/wordList.txt"  
# END EDITABLES #######################################

getWordList() {
  case $1 in
    "verb")  mawk '/ing$|ed$|en$/ {print $2}' $target ;;
    "adjective")  mawk '/y$|ish$/ {print $2}' $target ;;
    "noun")  mawk '!/ing$|ed$|en$|y$|ish$/ {print $2}' $target ;; 
    *) printf "%s" "'${1}' is an invalid argument." && echo && exit 1
  esac
}

pickRandomLineNumber() {
  # Get the list in an array
  declare -a list_a=("${!1}")
  # How many items in the list
  local length="${#list_a[@]}"
  # Generate a random number between 1 and the number of items in the list
  local number=$RANDOM 
  let "number %= $length"
  # Print the word at random line
  printf "%s\n" ${list_a[@]} | mawk -v line=$number 'NR==line {print}' 
}

#### CHANGE ####
getWord() {
  read -ra list <<< $( getWordList $1)
  local word=$(pickRandomLineNumber list[@])
  printf "%s" "${word}"
}

printf "%s %s %s %s\n" $(getWord adjective) $(getWord noun) $(getWord verb) $(getWord noun)

现在是输入/输出：

$ time sh genpass
overstay clench napping palace

real    0m0.403s
user    0m0.304s
sys     0m0.090s

再说一遍：

$ time sh genpass
gainfully cameo extended nutshell

real    0m0.369s
user    0m0.304s
sys     0m0.090s

时间上的差异并不是什么大问题，尽管总的来说，我认为它肯定会更快。

那么你知道为什么第二个脚本比第一个慢吗？

【问题讨论】：

当然，“占星术”和“滞留”不是形容词。您的规则需要一些调整。
并不重要，随机性很重要。我实际上可以跳过那些“动词 - 形容词 - 名词”的东西，让脚本输出 4 个完全随机的单词，但我认为这会很甜蜜。要么我调整规则，要么我摆脱它们。摆脱它们肯定会更快。但没有那么甜……
1.如果您需要多次调用awk，那么只需在awk 中编写整个脚本即可。 2. 无论如何，如果你多次解析同一个文件，你肯定做错了什么。 3. 不得使用awk 从数组中检索随机元素。这真的很愚蠢。您可以直接访问数组的任何字段。 4. 如果你觉得你需要“引用”（这就是你对declare -a list_a=("${!1}") 所做的），那么要么你的设计是错误的，要么你只是使用了错误的语言来完成这项工作：shell 脚本不应该使用这样的功能。
如果你想适当地优化它，内联数组并去掉所有的 Awk 代码。打印整个数组以获得第 n 个元素特别浪费，而您可以简单地使用printf '%s\n' "${array[number]}"。
在本地化第二个版本的性能损失方面，我对 cuprit 的猜测是（不必要！）使用变量间接复制输入数组 $(!1}。

标签： bash random awk passphrase mawk

【解决方案1】：

你有更多的代码做更多的事情，所有这些都是不必要的。以下是您尝试做的事情的方法：

$ cat tst.awk
function grw(arr) {     # Get Random Word
    return arr[int(rand() * length(arr)) + 1]
}

{
    if ( /(ing|ed|en)$/ ) verbs[++numVerbs] = $0
    else if ( /(y|ish)$/ ) adjectives[++numAdjectives] = $0
    else nouns[++numNouns] = $0
}

END {
    srand()
    printf "%s %s %s %s\n", grw(adjectives), grw(nouns), grw(verbs), grw(nouns)
}

$ awk -f tst.awk words
overstay clench siren clench
$ awk -f tst.awk words
prankish nutshell tingling cameo
$ awk -f tst.awk words
astrology clench tingling palace

以上是针对根据您在问题中提供的示例输出创建的“单词”文件运行的：

$ cat words
askew
astrology
cameo
clench
cringe
extended
fritter
gainfully
massager
napping
nutshell
overstay
palace
prankish
siren
tingling

【讨论】：

这当然可以，谢谢。而且要快得多。我想需要更多地了解awk。你能解释一下srand()的用途吗？这是否只是意味着您在调用那些grw 函数三次之前生成了一个种子？这是否意味着我们使用相同的种子 3 次？
time awk -f genpass.awk ~/wordList.txt -> 闷爷爷无福跳伞真实0m0.045s用户0m0.038s系统0m0.005s
它根据当前时间为第一次调用 rand() 生成一个种子。第二次调用 rand() 使用第一次调用 rand() 等的输出作为种子。没有srand() 每次调用 awk 时，第一个 rand() 都会以相同的种子值开始，因此每次调用 awk 都会生成相同的“随机”数字序列。阅读 Arnold Robbins 的《Effective Awk Programming, 4th Edition》一书，学习 awk 并立即了解到 shell 是一个环境，可以从中调用工具，用一种语言对这些调用进行排序，它不是操作文本的工具，这就是 awk是为了。