【发布时间】:2015-05-18 12:07:30
【问题描述】:
我有一项非常重要的任务是从大型 CSV 日志中提取一些相关数据,这些数据看起来像
Frame #,Residue,Internal,van der Waals,Electrostatic,Polar Solvation,Non-Polar Solv.,TOTAL
1,1,119.745,0.356,-132.009,-95.618,1.7886312,-105.7373688
1,2,106.093,-3.835,-182.473,40.582,0.7132608,-38.9197392
1,3,21.228,-1.744,-38.026,-7.707,1.1189664,-25.1300336
1,4,-5.717,-4.721,-30.38,-4.839,0.406512,-45.250488
1,5,70.846,-4.127,-53.317,-2.534,0.7808472,11.6488472
...
2,1,119.745,0.356,-132.009,-95.618,1.7886312,-105.7373688
2,2,106.093,-3.835,-182.473,40.582,0.7132608,-38.9197392
2,3,21.228,-1.744,-38.026,-7.707,1.1189664,-25.1300336
2,4,-5.717,-4.721,-30.38,-4.839,0.406512,-45.250488
2,5,70.846,-4.127,-53.317,-2.534,0.7808472,11.6488472
...
n,1,119.745,0.356,-132.009,-95.618,1.7886312,-105.7373688
n,2,106.093,-3.835,-182.473,40.582,0.7132608,-38.9197392
n,3,21.228,-1.744,-38.026,-7.707,1.1189664,-25.1300336
n,4,-5.717,-4.721,-30.38,-4.839,0.406512,-45.250488
n,5,70.846,-4.127,-53.317,-2.534,0.7808472,11.6488472
这里我想从第 2 列 (#residue) 中选择一个指定的值,并根据第 1 列 (#frame number) 写出其最后一列 (#total energy) 的演化(#snapshot number 列的函数) .换句话说,我需要 1)首先根据第二列对所有数据进行排序):即选择第二列中的数字等于指定值的每个字符串(即 n=27)
#Frame, #Residue
1,27, ... , # last column value which is interested for me!
2,27, ... , # last column value which is interested for me!
3,27, ... , # last column value which is interested for me!
3,27, ... , # last column value which is interested for me!
然后提取其最后一列的相应值,因此生成的日志将只有 3 列:
#Frame, #Residue, # Total energy
1,27, # last column value which is interested for me!
2,27, # last column value which is interested for me!
3,27, # last column value which is interested for me!
3,27, # last column value which is interested for me!
将感谢任何使用 awk 和 sed 的实现!
谢谢!
格莱布
【问题讨论】:
标签: bash text multiple-columns