【问题标题】:Split files based on matching string根据匹配字符串拆分文件
【发布时间】:2020-02-11 02:19:00
【问题描述】:

我有一个文件(input.txt),如下所示:

# STOCKHOLM 1.0

#=GF AC   RF00001
#=GF ID   5S_rRNA 
ghgjg---jkhkjhkjhk

## STOCKHOLM 1.0

#=GF AC   RF00002
#=GF ID   6S_rRNA

hhhjkjhk---kjhkjhkj


## STOCKHOLM 1.0

#=GF AC   RF00005
#=GF ID   12S_rRNA

hkhjhkjhkjuuwww

我必须拆分行等于 ##stockholm1.0 的文件,并用第二个字符串 RF00001_full.txt 中的值命名文件。因此,对于输入文件,我应该能够获得 3 个不同的文件,如下所示:

RF00001_full.txt

# STOCKHOLM 1.0

#=GF AC   RF00001
#=GF ID   5S_rRNA 
ghgjg---jkhkjhkjhk

RF00002_full.txt

## STOCKHOLM 1.0

#=GF AC   RF00002
#=GF ID   6S_rRNA

hhhjkjhk---kjhkjhkj

RF00005_full.txt

## STOCKHOLM 1.0

#=GF AC   RF00005
#=GF ID   12S_rRNA

hkhjhkjhkjuuwww

代码,我尝试到现在如下:

while read p;
if [[ $p == ## STOCKHOLM 1.0* ]];
then
#what should I do here to sort the line by OS ? 

done <input.txt

【问题讨论】:

  • [[ $p == ## STOCKHOLM 1.0* ]] does not make sense. The #` 引入了一个注释,即使你没有这个问题,一个空格也会表示一个参数的结束。因此你必须引用[[ $p == '## STOCKHOLM 1.0'* ]]

标签: shell unix awk grep


【解决方案1】:

您能否尝试使用提供的示例进行跟踪、编写和测试。

awk '
/STOCKHOLM/{
  close(file)
  file=count=""
}
(/STOCKHOLM/ || !NF) && !file{
  val=(val?val ORS:"")$0
  count++
  next
}
count==2{
  count=""
  file=$NF"_full.txt"
  if(val){
    print val > (file)
    val=""
  }
  next
}
file{
  print >> (file)
}
' Input_file

说明:在此添加详细说明。

awk '                             ##Starting awk program from here.
/STOCKHOLM/{                      ##Checking condition if string STOCKHOLM is present in line then do following.
  close(file)                     ##Closing the file opened in background to avoid errors.
  file=count=""                   ##Nullifying variables file and count here.
}
(/STOCKHOLM/ || !NF) && !file{    ##Checking condition if line has string STOCKHOLM OR null fields AND file variable is NULL then do following.
  val=(val?val ORS:"")$0          ##Creating val which is concatenating its own value each time cursor comes here.
  count++                         ##Increment variable count with 1 here.
  next                            ##next will skip all further statements from here.
}
count==2{                         ##Checking condition if count is 2 then do following.
  count=""                        ##Nullifying count here.
  file=$NF"_full.txt"             ##Creating outputfile name here with last field and string adding to it.
  if(val){                        ##Check if val is NOT NULL then do following.
    print val > (file)            ##Printing val into output file here.
    val=""                        ##Nullifying val here.
  }
  next                            ##next will skip all further statements from here.
}
file{                             ##if file is NOT NULL.
  print >> (file)                 ##Printing lines into output file here.
}
' Input_file                      ##Mentioning Input_file name here.

【讨论】:

    猜你喜欢
    • 2021-04-17
    • 1970-01-01
    • 2012-01-14
    • 1970-01-01
    • 2018-05-12
    • 1970-01-01
    • 2019-09-19
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多