在当今时代(2020 毕竟实际上是 21 世纪的第 3 个十年),
我认为正确的问题是如何找到所有非 utf-8 文件? Utf-8 是现代文本文件的等价物。
具有非 ascii 代码点的文本的 utf-8 编码将引入非 ascii 字节(即设置了最高有效位的字节)。现在,并非所有此类字节的序列都形成有效的 utf-8 序列。
moreutils 包中的
isutf8 是您所需要的。
$ isutf8 -l /bin/*
/bin/[
/bin/acyclic
/bin/addr2line
/bin/animate
/bin/applydeltarpm
/bin/apropos
⋮
快速检查:
$ file $(isutf8 -l /bin/*)
/bin/[: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4d70c2142fc672d8a69d033ecb6693ec15b1e6fb, for GNU/Linux 3.2.0, stripped
/bin/acyclic: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=d428ea52eb0e8aaf7faf30914710d8fbabe6ca28, for GNU/Linux 3.2.0, stripped
/bin/addr2line: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=797f42bc4f8fb754a49b816b82d6b40804626567, for GNU/Linux 3.2.0, stripped
/bin/animate: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=36ab46e69c1bfea433382ffc9bbd9708365dac2b, for GNU/Linux 3.2.0, stripped
/bin/applydeltarpm: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=a1fddcbeec9266e698782596f2dfd1b4f3e0b974, for GNU/Linux 3.2.0, stripped
/bin/apropos: symbolic link to whatis
⋮
您可能希望反转测试并获取所有文本文件。
使用-i:
$ isutf8 -il /bin/*
/bin/alias
/bin/bashbug
/bin/bashbug-64
/bin/bg
⋮
$ file -L $(isutf8 -il /bin/*)
/bin/alias: a /usr/bin/sh script, ASCII text executable
/bin/bashbug: a /usr/bin/sh - script, ASCII text executable, with very long lines
/bin/bashbug-64: a /usr/bin/sh - script, ASCII text executable, with very long lines
/bin/bg: a /usr/bin/sh script, ASCII text executable
⋮
是的,它会读取整个文件,但速度非常快,而且如果你想要准确的话……