【发布时间】:2020-09-06 18:27:17
【问题描述】:
我不是 shell 脚本专家,我正在努力寻找一种仅从 html 表中提取特定列的方法。 我尝试了不同的选项 awk、grep、hxselect,但不幸的是无法提出解决方案。
hxselect 要求 html 格式正确,这对我来说并非总是如此。 这是示例表
<table class="jiraIssueTable aui">
<colgroup>
<col width="18">
<col width="90">
<col>
<col width="9%">
<col width="9%">
<col width="9%">
</colgroup>
<thead>
<tr>
<th id="Related issues-type">Type</th>
<th id="Related issues-key">Key</th>
<th id="jiraDetailsText">Summary</th>
<th id="Related issues-status">Status</th>
<th id="Related issues-assignee">Assignee</th>
<th id="Related issues-fix-versions">Fix versions</th>
</tr>
</thead>
<tbody>
<tr class="" >
<td class="jiraIssueIcon" headers="Related issues-type"> <img class="issueTypeImg" src="/images/icons/jira_type_unknown.gif" alt="Unknown Issue Type"> </td>
<td class="jiraIssueKey" headers="Related issues-key"> <a title="View this issue" class="jiraIssueLink" data-issue-key="OL-541" id="viewIssueInJira:OL-541" href="">OL-541</a> </td>
<td headers="jiraDetailsText" class="jiraIssueDetailsError"> Increase the performance </td>
<td class="jiraIssueStatus" headers="Related issues-status"> </td>
<td headers="Related issues-assignee" class="jiraIssueDetailsError"> </td>
<td headers="Related issues-fix-versions" class="jiraIssueDetailsError"> </td>
</tr>
<tr class="" >
<td class="jiraIssueIcon" headers="Related issues-type"> <a href="devStatusDetailDialog=build" title="View this issue"> <img class="issueTypeImg" src="rType=issuetype" alt="Task"/> </a> </td>
<td class="jiraIssueKey" headers="Related issues-key"> <a title="View this issue" class="jiraIssueLink" data-issue-key="IT-2431" id="viewIssueInJira:IT-2431" href="">IT-2431</a> </td>
<td headers="jiraDetailsText" class="jiraIssueDetails"> Get some sample data </td>
<td class="jiraIssueStatus" headers="Related issues-status"> Verified/Closed </td>
<td headers="Related issues-assignee" class="jiraIssueDetails"> User A </td>
<td headers="Related issues-fix-versions" class="jiraIssueDetailsError"> </td>
</tr>
</tbody>
</table>
所以从这个表中我只需要 2 和 3 列内容。这意味着我的最终结果应该如下所示:
OL-541 提高性能
IT-2431 获取一些样本数据
感谢任何帮助
【问题讨论】:
-
Don't Parse XML/HTML With Regex. 我建议使用 XML/HTML 解析器 (xmlstarlet, xmllint ...)。
标签: bash shell awk html-table grep