【发布时间】:2020-07-01 12:43:02
【问题描述】:
我有一个被抓取的原始文件列表,它包含文本和源代码。下面是列出的文件类型,我想删除所有类型为 C 源、python 脚本、HTML 和空文件的文件,只保留 ASCII 和 unicode 文件。
file *
1dW6WJMN.txt: Python script, ASCII text executable
9dJbZ3Vv.txt: ASCII text, with CRLF line terminators
9dQsmVU4.txt: Python script, UTF-8 Unicode text executable, with CRLF line terminators
A5hENB7D.txt: C source, ASCII text, with CRLF line terminators
cidREdJG.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators
exhjw1gK.txt: UTF-8 Unicode text, with CRLF line terminators
iu7LPrqz.txt: ASCII text, with very long lines, with CRLF line terminators
LsDHarjD.txt: ASCII text
nLABt1a6.txt: C source, ASCII text, with CRLF line terminators
nqMDtVuz.txt: ASCII text, with CRLF line terminators
nqPuYb23.txt: UTF-8 Unicode text, with CRLF line terminators
nQtzxhfQ.txt: ASCII text, with CRLF line terminators
NQuLWwpt.txt: ASCII text, with CRLF line terminators
nQXeJeED.txt: ASCII text, with CRLF line terminators
nqXGv6ws.txt: UTF-8 Unicode text, with CRLF line terminators
nQxr4Hwi.txt: ASCII text, with CRLF line terminators
nQxr4Hwii.txt: empty
VQjrxevh.txt: HTML document, UTF-8 Unicode text, with very long lines, with CRLF line terminators
yfDEfn4L.txt: C source, ASCII text, with CRLF line terminators
yydAEDRn.txt: HTML document, ASCII text, with very long lines, with CRLF line terminators
我尝试使用带有 ASCII 的简单 grep,但所有源代码文件也包含术语 ASCII。有没有其他方法可以过滤掉这些源代码文件,因为有时还有我想摆脱的 PHP、javascript 文件。我对linux很陌生,任何帮助将不胜感激。提前致谢
【问题讨论】:
-
简单回答:grep 无法做到这一点,因为您需要从文件内容中确定文件类型。一旦你有了一个工具来从文件内容中确定文件类型,那么你需要做的就是编写一个脚本来遍历这些文件及其类型描述以过滤它们。