对于您展示的示例,您能否尝试在 GNU awk 中进行跟踪、编写和测试,我相信应该可以在任何 awk 中使用。
echo "sentence_1,My \POS{tailor,noun} is \POS{rich,adj}." |
awk '
{
first=val=finalVal=""
count=0
while(match($0,/[a-zA-Z]+ \\POS{[^,]*/)){
if(++count==1){
first=substr($0,1,RSTART-1)
}
val=substr($0,RSTART,RLENGTH)
sub(/\\POS{/,"",val)
finalVal=(finalVal?finalVal OFS:"")val
$0=substr($0,RSTART+RLENGTH)
}
print first finalVal
}'
或者尝试关注,如果您在\POS{rich,adj}. 之后有任何内容,例如.,那么就这样吧:
echo "sentence_1,My \POS{tailor,noun} is \POS{rich,adj}." |
awk '
{
while(match($0,/[a-zA-Z]+ \\POS{[^,]*/)){
if(++count==1){
first=substr($0,1,RSTART-1)
}
val=substr($0,RSTART,RLENGTH)
sub(/\\POS{/,"",val)
finalVal=(finalVal?finalVal OFS:"")val
$0=substr($0,RSTART+RLENGTH)
}
sub(/.*}/,"")
print first finalVal $0
}'
说明:为上述添加详细说明。
echo "sentence_1,My \POS{tailor,noun} is \POS{rich,adj}." | ##Using echo to print value.
##Sending its output as input to awk program.
awk ' ##Starting awk program from here.
{
first=val=finalVal="" ##Nullifying variables here.
count=0 ##Setting count to 0 here.
while(match($0,/[a-zA-Z]+ \\POS{[^,]*/)){ ##Using while loop to run match in it.
##Match has regex to match one or more alphabets space \POS{ till comma comes.
if(++count==1){ ##Checking condition if count is 1 then do following.
first=substr($0,1,RSTART-1) ##Creating first to have everything before matched this should have very first matches before value eg--> sentence_1,My
}
val=substr($0,RSTART,RLENGTH) ##Creating val which is sub string of matched regex.
sub(/\\POS{/,"",val) ##Using substitute \POS{ with NULL.
finalVal=(finalVal?finalVal OFS:"")val ##Creating finalVal to have all values in it.
$0=substr($0,RSTART+RLENGTH) ##Re-creating whole line to have only rest of the line in it, removing matched part.
}
print first finalVal ##Printing first and finalVal here.
}'