【发布时间】:2015-08-26 15:07:12
【问题描述】:
我有一个由 HTML 代码组成的文本文件,我需要对其进行操作以使其更具可读性。我的问题是每个文件名有两行不是唯一的,但我需要将它们区分开来:
编辑
我会在这里为那些提出要求的人输入:
<body>
<tbody>
<tr><td><b>Test Suite</b></td></tr>
<tr><td><a href="HAPPY/3_step_minimal_foundation_no_prefill_HAPPY">3_step_minimal_foundation_no_prefill_HAPPY</a></td></tr>
<tr><td><a href="HAPPY/fullform_no_prefill_HAPPY">fullform_no_prefill_HAPPY</a></td></tr>
<tr><td><a href="HAPPY/fullform_mobile_foundation_no_prefill_HAPPY">fullform_mobile_foundation_no_prefill_HAPPY</a></td></tr>
<tr><td><a href="SAD/3_step_minimal_foundation_SAD">3_step_minimal_foundation_SAD</a></td></tr>
<tr><td><a href="SAD/fullform_SAD">fullform_SAD</a></td></tr>
<tr><td><a href="SAD/fullform_mobile_foundation_SAD">fullform_mobile_foundation_SAD</a></td></tr>
<tr><td><a href="HAPPY_PLUS_OPTIONS/3_step_minimal_foundation_HAPPY_PLUS_OPTIONS">3_step_minimal_foundation_HAPPY_PLUS_OPTIONS</a></td></tr>
<tr><td><a href="HAPPY_PLUS_OPTIONS/fullform_HAPPY_PLUS_OPTIONS">fullform_HAPPY_PLUS_OPTIONS</a></td></tr>
<tr><td><a href="HAPPY_PLUS_OPTIONS/fullform_mobile_foundation_HAPPY_PLUS_OPTIONS">fullform_mobile_foundation_HAPPY_PLUS_OPTIONS</a></td></tr>
<tr><td><a href="SAD_PLUS_OPTIONS/3_step_minimal_foundation_SAD_PLUS_OPTIONS">3_step_minimal_foundation_SAD_PLUS_OPTIONS</a></td></tr>
<tr><td><a href="SAD_PLUS_OPTIONS/fullform_SAD_PLUS_OPTIONS">fullform_SAD_PLUS_OPTIONS</a></td></tr>
<tr><td><a href="SAD_PLUS_OPTIONS/fullform_mobile_foundation_SAD_PLUS_OPTIONS">fullform_mobile_foundation_SAD_PLUS_OPTIONS</a></td></tr>
</tbody></table>
</body>
3_step_minimal_foundation_no_prefill_HAPPY
和
3_step_minimal_foundation_no_prefill_HAPPY
例如需要变成:
3_step_minimal_foundation_no_prefill
和
3_step_minimal_foundation_no_prefill_HAPPY
我当前的文本文件状态:
这是实现此目的的代码:
$ sed -n '/ref/p' EVERYTHING | awk -F'[/"<> ]+' '{sub("", "", $6); print $6, $7, $8}' | tr -s '[[:space:]]' '\n' | awk -v n=3 '1; NR % n == 0 {print ""}' | sed '/^HAPPY/s/^/Flow Type\: /' | sed '/^SAD/s/^/Flow Type\: /' | sed '$d'
Flow Type: HAPPY
3_step_minimal_foundation_no_prefill_HAPPY
3_step_minimal_foundation_no_prefill_HAPPY
Flow Type: HAPPY
fullform_no_prefill_HAPPY
fullform_no_prefill_HAPPY
Flow Type: HAPPY
fullform_mobile_foundation_no_prefill_HAPPY
fullform_mobile_foundation_no_prefill_HAPPY
Flow Type: SAD
3_step_minimal_foundation_SAD
3_step_minimal_foundation_SAD
Flow Type: SAD
fullform_SAD
fullform_SAD
Flow Type: SAD
fullform_mobile_foundation_SAD
fullform_mobile_foundation_SAD
Flow Type: HAPPY_PLUS_OPTIONS
3_step_minimal_foundation_HAPPY_PLUS_OPTIONS
3_step_minimal_foundation_HAPPY_PLUS_OPTIONS
Flow Type: HAPPY_PLUS_OPTIONS
fullform_HAPPY_PLUS_OPTIONS
fullform_HAPPY_PLUS_OPTIONS
我想要的输出:
Flow Type: HAPPY
Flow Name: 3_step_minimal_foundation_no_prefill
File Name: 3_step_minimal_foundation_no_prefill_HAPPY
Flow Type: HAPPY
Flow Name: fullform_no_prefill
File Name: fullform_no_prefill_HAPPY
Flow Type: HAPPY
Flow Name: fullform_mobile_foundation_no_prefill
File Name: fullform_mobile_foundation_no_prefill_HAPPY
Flow Type: SAD
Flow Name: 3_step_minimal_foundation
File Name: 3_step_minimal_foundation_SAD
Flow Type: SAD
Flow Name: fullform
File Name: fullform_SAD
Flow Type: SAD
Flow Name: fullform_mobile_foundation
File Name: fullform_mobile_foundation_SAD
Flow Type: HAPPY_PLUS_OPTIONS
Flow Name: 3_step_minimal_foundation
File Name: 3_step_minimal_foundation_HAPPY_PLUS_OPTIONS
Flow Type: HAPPY_PLUS_OPTIONS
Flow Name: fullform
File Name: fullform_HAPPY_PLUS_OPTIONS
有没有办法从第 N 行删除/保留特定文本?一旦我让每一行都独一无二,就很容易正确地标记每一行。
-最佳
【问题讨论】:
-
您要在此处删除哪些行?目前还不清楚。您尝试将问题分解为多个步骤是件好事,但也许直接从输入到输出会更容易。
-
我不想删除任何行,我想区分每个文件不唯一的两行,即(3_step_minimal_foundation_no_prefill_HAPPY 和 3_step_minimal_foundation_no_prefill_HAPPY)我需要为其中一个删除 _HAPPY这些行,其他行保持不变。
-
我可以添加输入,如果这样会更容易编辑 好的,输入现在可以查看。感谢迄今为止的所有帮助!
-
嗯,现在肯定不容易,所以去吧,你可以添加任何东西。示例输入和预期输出是标准,以及您已经编写的任何试图实现您正在做的事情的代码。如果你能解释你认为代码应该如何工作,那么解决问题的过程会快得多。
-
对于与上一行匹配的行,您是否要删除从最后一个下划线到行尾的所有内容?