【发布时间】:2014-11-05 10:15:09
【问题描述】:
我最初过滤了我的文本文件以仅包含那些已识别模式的行(在本例中为“TCTGTACTATATG”)。现在从生成的文件中,我想从包含它的每一行中删除此模式以及上游字符。 使用 AWK 的最佳方法是什么?
这是我的输入:
@DGTKZQN1:384:C364AACXX:1:1109:19757:66886 2:N:0:GTGAAA
AACAGTTTCTGTACTATATTGACTCATAAGAGTGGTTTAATACGAAGGGAGGAGAAGTTTCCTGGAAATAATCGATTTCCTAGCTTTTAGTTGCAATAAT
+
CCCFFFFFHHHHDIIJJJJJJJJJIIJEIJHHCFGFFGHIIIIJGGIJGG@GHIGEEFDGGIGIJJIEHGIEHHHEDFFFDEEEDDEDDCCDBDDDCDDD
@DGTKZQN1:384:C364AACXX:1:1109:20360:66756 2:N:0:GTGAAA
TTTCTGTACTATATTGGGTGTGAGAAGTAATGGTGCACTCCACAGACCTCCAGTGGCTGCTTGTTCGCCAGAACAGCAAATTTCTGCAGAAGCGCAAAAG
+
@@CFFFFFHHHGHIIIJI;GCGGIIIJFHIIJGEDGGIJIICBDFIIIIJHIIGHIDHGEEHGHHIIJHGD?DDFEECEDDDDCDCCDDDCDDDDDDBC>
@DGTKZQN1:384:C364AACXX:1:1109:21207:66784 2:N:0:GTGAAA
AACAGTTTCTGTACTATATTGTACGTTGTGGATTATTAAAGGGAATAAAAGTGGTAGATTGTGCAGTTGAGGCAGGCTCTCAACTGTGAAACAGCGGTGG
+
@@CFFBDDFHBDCGG<?:CEEAFEEF@A3<?<3C>FEGHGG@DB?8BF@G>?0909??DF>HE@C=)8CEH9DHCB:AED>?C@6>C;6>C3?3=@B8B=
@DGTKZQN1:384:C364AACXX:1:1109:21026:66836 2:N:0:GTGAAA
AGAACAGTTTCTGTACTATATTGTTATACTTCTGTTGTGGGTGTAGAGTTTTCTCCGGCGTTGGCTTCAATGGAATAAGGCACGAGATGAATCCGTGGAG
+
@@@FFFFDHHHDHHIIJJEHHJGJJIGIIEIIIIEHEGHIJDF?DGEE4??DG@FGEG:FHHHHF@D@CEACEEEDDDCCCDDBDDDDDDDACDB??>BD
输出应该是这样的:
@DGTKZQN1:384:C364AACXX:1:1109:19757:66886 2:N:0:GTGAAA
ACTCATAAGAGTGGTTTAATACGAAGGGAGGAGAAGTTTCCTGGAAATAATCGATTTCCTAGCTTTTAGTTGCAATAAT
+
CCCFFFFFHHHHDIIJJJJJJJJJIIJEIJHHCFGFFGHIIIIJGGIJGG@GHIGEEFDGGIGIJJIEHGIEHHHEDFFFDEEEDDEDDCCDBDDDCDDD
@DGTKZQN1:384:C364AACXX:1:1109:20360:66756 2:N:0:GTGAAA
GGTGTGAGAAGTAATGGTGCACTCCACAGACCTCCAGTGGCTGCTTGTTCGCCAGAACAGCAAATTTCTGCAGAAGCGCAAAAG
+
@@CFFFFFHHHGHIIIJI;GCGGIIIJFHIIJGEDGGIJIICBDFIIIIJHIIGHIDHGEEHGHHIIJHGD?DDFEECEDDDDCDCCDDDCDDDDDDBC>
@DGTKZQN1:384:C364AACXX:1:1109:21207:66784 2:N:0:GTGAAA
TACGTTGTGGATTATTAAAGGGAATAAAAGTGGTAGATTGTGCAGTTGAGGCAGGCTCTCAACTGTGAAACAGCGGTGG
+
@@CFFBDDFHBDCGG<?:CEEAFEEF@A3<?<3C>FEGHGG@DB?8BF@G>?0909??DF>HE@C=)8CEH9DHCB:AED>?C@6>C;6>C3?3=@B8B=
@DGTKZQN1:384:C364AACXX:1:1109:21026:66836 2:N:0:GTGAAA
TTATACTTCTGTTGTGGGTGTAGAGTTTTCTCCGGCGTTGGCTTCAATGGAATAAGGCACGAGATGAATCCGTGGAG
+
@@@FFFFDHHHDHHIIJJEHHJGJJIGIIEIIIIEHEGHIJDF?DGEE4??DG@FGEG:FHHHHF@D@CEACEEEDDDCCCDDBDDDDDDDACDB??>BD
我已经尝试过使用 awk 和 split 函数,但我正在努力使用字符串作为字段分隔符。
【问题讨论】:
-
你想要的结果/输出是?