【发布时间】:2015-09-10 13:25:41
【问题描述】:
我想知道是否有更有效的方法来使用 awk/grep/sed 来解决以下问题?
我想解析我的输入文件的某个列(在本示例中为第 1 列)并使用 awk/grep/任何其他函数来子集和选择与我的查询匹配的模式。例如给出下面的文件;
chr1 3009844 3009908 DXX 42 -
chr2 3000386 3000450 DXX 15 -
chr3 3000386 3000450 DXX 15 -
chr4 3000386 3000450 DXX 15 -
chr5 3000386 3000450 DXX 15 -
chr6 3000386 3000450 DXX 15 -
chr7 3000386 3000450 DXX 15 -
chr8 3000386 3000450 DXX 15 -
chr9 3000386 3000450 DXX 15 -
chr10 3000386 3000450 DXX 15 -
chr11 3000386 3000450 DXX 15 -
chr12 3000386 3000450 DXX 15 -
chr13 3000386 3000450 DXX 15 -
chr14 3000386 3000450 DXX 15 -
chr15 3000386 3000450 DXX 15 -
chr16 3000386 3000450 DXX 15 -
chr17 3000386 3000450 DXX 15 -
chr18 3000386 3000450 DXX 15 -
chr19 3000386 3000450 DXX 15 -
chrX 3000386 3000450 DXX 15 -
chrY 3000386 3000450 DXX 15 -
chr1_GL456210_random 3000386 3000450 DXX 15 -
chr1_GL456211_random 3000386 3000450 DXX 15 -
chr1_GL456212_random 3000386 3000450 DXX 15 -
chr1_GL456221_random 3000386 3000450 DXX 15 -
chr4_GL456216_random 3000386 3000450 DXX 15 -
chr4_JH584292_random 3000386 3000450 DXX 15 -
chr4_JH584295_random 3000386 3000450 DXX 15 -
chr5_GL456354_random 3000386 3000450 DXX 15 -
chr5_JH584296_random 3000386 3000450 DXX 15 -
chr5_JH584297_random 3000386 3000450 DXX 15 -
chr5_JH584299_random 3000386 3000450 DXX 15 -
chrX_GL456233_random 3000386 3000450 DXX 15 -
我只想有一个输出,例如,第一列中只有 chr1-chr22、chrX 和 chrY;
chr1 3009844 3009908 DXX 42 -
chr2 3000386 3000450 DXX 15 -
chr3 3000386 3000450 DXX 15 -
chr4 3000386 3000450 DXX 15 -
chr5 3000386 3000450 DXX 15 -
chr6 3000386 3000450 DXX 15 -
chr7 3000386 3000450 DXX 15 -
chr8 3000386 3000450 DXX 15 -
chr9 3000386 3000450 DXX 15 -
chr10 3000386 3000450 DXX 15 -
chr11 3000386 3000450 DXX 15 -
chr12 3000386 3000450 DXX 15 -
chr13 3000386 3000450 DXX 15 -
chr14 3000386 3000450 DXX 15 -
chr15 3000386 3000450 DXX 15 -
chr16 3000386 3000450 DXX 15 -
chr17 3000386 3000450 DXX 15 -
chr18 3000386 3000450 DXX 15 -
chr19 3000386 3000450 DXX 15 -
chrX 3000386 3000450 DXX 15 -
chrY 3000386 3000450 DXX 15 -
我设法使用以下命令找到了解决方案:
awk '$1 == "chr1" || $1 == "chr2" || $1 == "chr3" || $1 == "chr4" || $1 == "chr5" || $1 == "chr6" || $1 == "chr7" || $1 == "chr8" || $1 == "chr9" || $1 == "chr10" || $1 == "chr11" || $1 == "chr12" || $1 == "chr13" || $1 == "chr14" || $1 == "chr15" || $1 == "chr16" || $1 == "chr17" || $1 == "chr18" || $1 == "chr19" || $1 == "chr20" || $1 == "chrX" || $1 == "chrY"' in_file > out_file
它工作正常,但想知道亲爱的会员是否有更优雅的方式来解决问题?或者,如果您可以指出资源以在 linux 中探索 awk/grep,我们将不胜感激!
【问题讨论】:
标签: bash unix awk grep pattern-matching