【问题标题】:HTML Agility Pack xpath expression assistanceHTML Agility Pack xpath 表达式帮助
【发布时间】:2012-08-15 20:34:45
【问题描述】:

我有如下的html:

<table width="98%" border="0" align="center" cellpadding="0" cellspacing="2">
<tr>
    <td height="20" colspan="2">
        &nbsp;
    </td>
</tr>
<tr>
    <td height="20" colspan="2" class="fontDestaque2NegritoHome">
        MATR&Iacute;CULA: PPAAG
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        &nbsp;
    </td>
    <td align="right" valign="middle" class="tx_bd">
        &nbsp;
    </td>
</tr>
<tr>
    <td width="35%" align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Fabricante:</span>
        </div>
    </td>
    <td width="59%" align="right" valign="middle" class="tx_bd">
        <div align="left">
            CESSNA AIRCRAFT</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Modelo:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            T206H</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">N&uacute;mero de S&eacute;rie:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            T20608735</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left" class="tx_bold">
            Tipo ICAO :
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            C206</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Tipo de Habilita&ccedil;&atilde;o para Pilotos:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            MNTE</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Classe da Aeronave:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            POUSO CONVECIONAL 1 MOTOR CONVENCIONAL</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Peso M&aacute;ximo de Decolagem:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            1633 - Kg</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">N&uacute;mero M&aacute;ximo de Passageiros:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            005</div>
    </td>
</tr>
<tr>
    <td colspan="2" align="right" valign="middle" class="tx_bd">
        <div align="left">
        </div>
    </td>
</tr>
<tr>
    <td colspan="2" align="right" valign="middle" background="../images/bgPontilhado.gif"
        class="tx_bd">
        <div align="left">
            <img src="../images/bgPontilhado.gif" width="4" height="1"></div>
    </td>
</tr>
<tr>
    <td colspan="2" align="right" valign="middle" class="tx_bd">
        <div align="left">
        </div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Categoria de Registro:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            PRIVADA SERVICO AEREO PRIVADOS</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">N&uacute;mero dos Certificados (CM - CA)</span>:
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            19040</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Situa&ccedil;&atilde;o no RAB:</span><span class="stop_litle">
            </span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="fontRed">ARRENDAMENTO OPERACIONAL/ALIENACAO FIDUCIARIA</span></div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Data da Compra/Transfer&ecirc;ncia:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
        </div>
    </td>
</tr>
<tr>
    <td colspan="2" align="right" valign="middle" class="tx_bd">
        <div align="left">
        </div>
    </td>
</tr>
<tr>
    <td colspan="2" align="right" valign="middle" background="../images/bgPontilhado.gif"
        class="tx_bd">
        <div align="left">
            <img src="../images/bgPontilhado.gif" width="4" height="1"></div>
    </td>
</tr>
<tr>
    <td colspan="2" align="right" valign="middle" class="tx_bd">
        <div align="left">
        </div>
    </td>
    <tr>
        <td align="right" valign="middle" class="tx_bd">
            <div align="left">
                <span class="tx_bold">Data de Validade do CA: </span>
            </div>
        </td>
        <td width="3%" align="right" valign="middle" class="tx_bd">
            <div align="left">
                <span class="tx_bd">21/05/16</span></div>
        </td>
    </tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left" class="tx_bold">
            <div align="left">
                Data de Validade da IAM:
            </div>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bd">110513</span></div>
    </td>
</tr>
<tr>
    <td height="18" align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Situa&ccedil;&atilde;o de Aeronavegabilidade:</span>
        </div>
    </td>
    <td height="18" align="right" valign="middle" class="tx_bd">
        <div align="left">
            Normal</div>
    </td>
</tr>
<tr>
    <td height="18" colspan="2" align="right" valign="middle" class="tx_bd">
        <div align="left" class="tx_bold">
            Motivo(s):
        </div>
    </td>
</tr>
<tr>
    <td height="18" colspan="2" align="right" valign="middle" class="tx_bd">
        <div align="left">
            <blockquote>
                <p>
                    <span class="tx_bold"></span>
                </p>
            </blockquote>
        </div>
    </td>
</tr>
<tr>
    <td height="18" colspan="2" align="left" valign="middle" class="tx_bd">
        Consulta realizada em: 16/8/2012 15:52:45<br>
    </td>
</tr>

我要抓取以下文字:

  • 塞斯纳飞机
  • T206H
  • T20608735
  • C206
  • MNTE
  • POUSO CONVECIONAL 1 MOTOR CONVENCIONAL
  • 1633 - 公斤
  • 005
  • PRIVADA SERVICO AEREO PRIVADOS
  • 19040
  • ARRENDAMENTO OPERACIONAL/ALIENACAO FIDUCIARIA
  • 21/05/16
  • 110513
  • 正常

一些 div 只包含我需要的文本。其他 div 包含一个包含我需要的文本的跨度。我将如何为此构建 xpath?

【问题讨论】:

    标签: xpath html-agility-pack


    【解决方案1】:

    使用

    //tr/td[@align='right' and @valign='middle' and @class='tx_bd']
           /div[@align='left 'and not(*)]
             /text()
    

    【讨论】:

    • 我得到一个空引用异常,所以我将使用我正在搜索的 html 文档更新我的原始帖子,而不仅仅是那个 sn-p。
    • @SFAgitator,XPath 表达式(不管它评估的是什么 XML 文档),不能成为空引用异常的原因——问题出在代码的其他地方。
    • 我假设 NRE 可能是由在文档中找不到匹配的表达式引起的。我再看一遍代码。
    • @SFAgitator,不,如果没有选择节点,评估结果是一个空的节点列表——不是例外。如果代码随后尝试从空节点列表中获取项目 -- this 可能会导致异常。
    • 我应该澄清一下——你是对的。因为我现在只是在测试,所以我没有发现任何错误,事实上,我正在尝试从 empty(null) 列表中获取一个项目。下面,当我运行代码时,数据变量为空: var data = document.DocumentNode.SelectNodes("//tr/td[@align='right' and @valign='middle' and @class='tx_bd'] /div[@align='左'而不是(*)]/text()");
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-02-13
    • 1970-01-01
    • 2014-07-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-12-01
    相关资源
    最近更新 更多