【发布时间】:2018-02-28 02:50:56
【问题描述】:
示例文本文件将是这样的
ID Z4WTH3_9ACTN Unreviewed; 182 AA.
AC Z4WTH3; A0SD0SDF;
AC Z12SDFG3; ADFFGDF;
DT 11-JUN-2014, integrated into UniProtKB/TrEMBL.
SQ SEQUENCE 182 AA; 20675 MW; B85D18AC3B1F0E75 CRC64;
MNFLEYNKDE KLHFNYKKSC GLWLIVVALI IFAATVIGGK QIINMSVFSF GYVAAFLSIN
//
ID Z4WXU8_9ACTN Unreviewed; 203 AA.
AC Z4WXU8;
AC QWERDFV1;
DT 11-JUN-2014, integrated into UniProtKB/TrEMBL.
SQ SEQUENCE 203 AA; 23224 MW; 35F1AE4342F6B3AC CRC64;
MDCKSIRSEV LWQVVRLREK LMNFLEYNKD EKLCFNYKKS CGLWLIVVAL IIFAATVIGG
//
ID Z9JHX1_9GAMM Unreviewed; 132 AA.
AC Z9JHX1;
SQ SEQUENCE 132 AA; 13880 MW; 0E09988C0F3ED155 CRC64;
MKISVDTNVL ARAVLQDDAN QGRSASTLLK DASLIAVSLP CLCELVWILS RGAKLSKEDV
//
实际文件为 100GB 文件 该文件仅包含一个“ID”行,并且始终以“ID”行开头。以“//”结尾
“AC”行可以是多行。我们必须将第一个“AC”行的第一个元素作为文件名。
需要根据“//”将该文件拆分为多个文件。 每个文件都应命名为以 AC 开头的行中的文本。
所以输出文件看起来像
Z4WTH3.txt
ID Z4WTH3_9ACTN Unreviewed; 182 AA.
AC Z4WTH3; A0SD0SDF;
AC Z12SDFG3; ADFFGDF;
DT 11-JUN-2014, integrated into UniProtKB/TrEMBL.
SQ SEQUENCE 182 AA; 20675 MW; B85D18AC3B1F0E75 CRC64;
MNFLEYNKDE KLHFNYKKSC GLWLIVVALI IFAATVIGGK QIINMSVFSF GYVAAFLSIN
//
Z4WXU8.txt
ID Z4WXU8_9ACTN Unreviewed; 203 AA.
AC Z4WXU8;
AC QWERDFV1;
DT 11-JUN-2014, integrated into UniProtKB/TrEMBL.
SQ SEQUENCE 203 AA; 23224 MW; 35F1AE4342F6B3AC CRC64;
MDCKSIRSEV LWQVVRLREK LMNFLEYNKD EKLCFNYKKS CGLWLIVVAL IIFAATVIGG
//
Z9JHX1.txt
ID Z9JHX1_9GAMM Unreviewed; 132 AA.
AC Z9JHX1;
SQ SEQUENCE 132 AA; 13880 MW; 0E09988C0F3ED155 CRC64;
MKISVDTNVL ARAVLQDDAN QGRSASTLLK DASLIAVSLP CLCELVWILS RGAKLSKEDV
//
【问题讨论】:
-
请添加您尝试过的代码...此问答接近您的需要:stackoverflow.com/questions/48984857/…
标签: awk