解析序列文件的标题[关闭]答案

【问题标题】：Parsing the headers of sequence file [closed]解析序列文件的标题[关闭]
【发布时间】：2013-09-25 11:10:29
【问题描述】：

我有一个多序列文件

>abc|d017961
sequence1......

>cdf|rhtdm9
sequence2......

>ijm|smthr12
sequence3......

>abc|d011wejr
sequence4......

>stg|eethwe77
sequence5......

我想编辑文件并希望结果文件为

>abc_ABC__d017961
sequence1......

>cdf_CDF__rhtdm9
sequence2......

>ijm_IJM__smthr12
sequence3......

>abc_ABC__d011wejr
sequence4......

>stg_STG__eethwe77
sequence5......

谢谢！

【问题讨论】：

我认为这就是你的标题的样子，山姆。如果没有，请随时更改。
在stackexchange 中交叉发布。请避免交叉发布。
此问题已在unix.stackexchange.com/questions/92291/… 交叉发布和回答

标签： linux perl bash shell unix

【解决方案1】：

perl -pe 's/ (\w+) \| /$1_\U$1\E__/x' file

或

perl -lpe '$_ = "$1_\U$1\E__$2" if / (\w+) \| (\w+)/x' file

【讨论】：

当使用捕获和连接捕获变量时，您可以考虑改用s/// 运算符。
现在我需要将文件>xbc_ABC__d017961 word task kite sequence1...... >df_CDF__rhtdm9 word task kite sequence2......更改为结果文件>abc|d017961 sequence1.. .... >cdf|rhtdm9 序列2......
@sam perl -pe 's/ \w+? ([A-Z]+) __ /\L$1|/x' file

【解决方案2】：

您可以将输入字段分隔符 (FS) 定义为 |，将输出字段分隔符 (OFS) 定义为 _，然后使用 toupper() 函数。

大家一起：

$ awk 'BEGIN{OFS="_"; FS="\|"}{print $1,toupper($1),OFS,$2}' file
abc_ABC___d017961 sequence1......
cdf_CDF___rhtdm9 sequence2......
ijm_IJM___smthr12 sequence3......
abc_ABC___d011wejr sequence4......
stg_STG___eethwe77 sequence5......

【讨论】：