【发布时间】:2015-10-31 10:39:30
【问题描述】:
我正在尝试抓取以下 html:
<table>
<tr>
<td class="cellRight" style="cursor:pointer;">
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td class="cellRight" style="border:0;color:#0066CC;"
title="View summary" width="70%">92%</td>
<td class="cellRight" style="border:0;" width="30%">
</td>
</tr>
</table>
</td>
</tr>
<tr class="listroweven">
<td class="cellLeft" nowrap><span class="categorytab" onclick=
"showAssignmentsByMPAndCourse('08/03/2015','58100:6');" title=
"Display Assignments for Art 5 with Ms. Martinho"><span style=
"text-decoration: underline">58100/6 - Art 5 with Ms.
Martinho</span></span></td>
<td class="cellLeft" nowrap>
Martinho, Suzette<br>
<b>Email:</b> <a href="mailto:smartinho@mtsd.us" style=
"text-decoration:none"><img alt="" border="0" src=
"/genesis/images/labelIcon.png" title=
"Send e-mail to teacher"></a>
</td>
<td class="cellRight" onclick=
"window.location.href = '/genesis/parents?tab1=studentdata&tab2=gradebook&tab3=coursesummary&studentid=100916&action=form&courseCode=58100&courseSection=6&mp=MP4';"
style="cursor:pointer;">
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td class="cellCenter"><span style=
"font-style:italic;color:brown;font-size: 8pt;">No
Grades</span></td>
</tr>
</table>
</td>
</tr>
<tr class="listrowodd">
<td class="cellLeft" nowrap><span class="categorytab" onclick=
"showAssignmentsByMPAndCourse('08/03/2015','58200:10');" title=
"Display Assignments for Family and Consumer Sciences 5 with Sheerin">
<span style="text-decoration: underline">58200/10 - Family and
Consumer Sciences 5 with Sheerin</span></span></td>
<td class="cellLeft" nowrap>
Sheerin, Susan<br>
<b>Email:</b> <a href="mailto:ssheerin@mtsd.us" style=
"text-decoration:none"><img alt="" border="0" src=
"/genesis/images/labelIcon.png" title=
"Send e-mail to teacher"></a>
</td>
<td class="cellRight" style="cursor:pointer;">
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td class="cellCenter"><span style=
"font-style:italic;color:brown;font-size: 8pt;">No
Grades</span></td>
</tr>
</table>
</td>
</tr>
</table>
我正在尝试提取学生成绩的值,如果没有成绩,则在这种情况下,html 中将显示值“无成绩”。但是,当我执行如下选择请求时:
doc.select("[class=cellRight]")
我得到一个输出,其中所有等级值都列出了两次(因为它们嵌套在包含 [class=cellRight] 区分符和正常数量的“无等级”列表的两个元素中。所以我的问题是,如何我只能在包含区分符 [class=cellRight] 的文档中选择最里面的孩子吗?(我已经处理了空白值的问题)感谢所有帮助!
【问题讨论】:
-
你能清理一下html吗?
-
这样更好吗? @luksch