【问题标题】:Python Selenium: Finding elements by xpath when there are duplicates in the html codePython Selenium:当html代码中有重复项时通过xpath查找元素
【发布时间】:2021-11-20 23:26:31
【问题描述】:

所以我正在使用 selenium 从一个酒类销售网站上抓取内容,以便更快地将产品详细信息添加到电子表格中。我正在使用 selenium 登录网站并搜索正确的产品。一旦我进入产品页面,我就可以抓取我需要的所有数据,除了包含在某个代码块中的一些数据。

我需要 3 条数据:每箱价格、每瓶价格和每盎司价格。我在代码中注意到我正在寻找的数据以类似的模式出现两次。有趣的是,我想要的正确数据是数据的第二次出现(第一次出现不正确)。相关的HTML代码是:

<h2>Pricing</h2>

  <div id="prices-table">
      

    

<div class="table-responsive">
  <table class="table table-condensed auto-width">
    <thead>
      <tr>
        <th></th>

          <th class="best-bottle-top">
            Frontline
          </th>

      </tr>
    </thead>

    <tbody>

      <tr>
        <td>Bottles</td>

          <td class="best-bottle-mid">1</td>

      </tr>

        <tr>
          <td>Cases</td>

            <td class="best-bottle-mid">—</td>

        </tr>

      <tr>
        <td>Price per bottle</td>
          <td class="best-bottle-mid">
            <div>$16.14   #I don't want this data </div>
          </td>

      </tr>

        <tr>
          <td>Price per case</td>

            <td class="best-bottle-mid">
              <div>
                $193.71   #I don't want this data
              </div>
            </td>
        </tr>



        <tr>
          <td>Cost per ounce</td>

            <td class="best-bottle-mid">
              <div>$1.27   #I don't want this data </div>

            </td>

        </tr>

      <tr>
        <td></td>

            <td class="best-bottle-bot text-muted">
              <span class="best-bottle-bot-content">

                <span>
                  <div><small>Best</small></div>
                  <small>Bottle</small>
                </span>
              </span>
            </td>

      </tr>
    </tbody>
  </table>
</div>





  <p>
    <em class="price-disclaimer">Defer to Athens Distributing Company of Tennessee in case of any price discrepancies.</em>
  </p>



  </div>

            </div>
            <hr class="visible-print-block">
            <div class="tab-pane active" id="3400355">
              <dl class="dl-horizontal vpv-row">
  <dt>Sizing</dt><dd>750 mL bottle × 6</dd>
        <dt>SKU</dt><dd>80914</dd>
  <dt>UPC</dt><dd>853192006189</dd>
  <dt>Status</dt><dd>Active</dd>
  
  
  
  <dt>Availability</dt><dd>
      <span class="label label-success inventory-status-badge"><span data-container="body" data-toggle="popover" data-placement="top" data-content="Athens Distributing Company of Tennessee is integrated with SevenFifty and sends inventory levels at least once a day. You can order this item and expect that it is available." data-original-title="" title="">IN STOCK</span></span>
</dd></dl>




  <div id="prices-table">
      

    

<div class="table-responsive">
  <table class="table table-condensed auto-width">
    <h2>Pricing</h2><thead>
      <tr>
        <th></th>

          <th class="best-bottle-top">
            Frontline
          </th>

      </tr>
    </thead>

    <tbody>

      <tr>
        <td>Bottles</td>

          <td class="best-bottle-mid">1</td>

      </tr>

        <tr>
          <td>Cases</td>

            <td class="best-bottle-mid">—</td>

        </tr>

      <tr>
        <td>Price per bottle</td>
          <td class="best-bottle-mid">
            <div>$33.03   #I want THIS data </div>
          </td>

      </tr>

        <tr>
          <td>Price per case</td>

            <td class="best-bottle-mid">
              <div>
                $198.18   I want THIS data
              </div>
            </td>
        </tr>



        <tr>
          <td>Cost per ounce</td>

            <td class="best-bottle-mid">
              <div>$1.30   I want THIS data </div>

            </td>

        </tr>

      <tr>
        <td></td>

            <td class="best-bottle-bot text-muted">
              <span class="best-bottle-bot-content">

                <span>
                  <div><small>Best</small></div>
                  <small>Bottle</small>
                </span>
              </span>
            </td>

      </tr>
    </tbody>
  </table>
</div>

使用完整的 xpath chrome 可以找到我想要的,但尝试相对路径不起作用。这是我尝试过的:

案例价格的完整 xpath(有效但不想使用绝对引用):

/html/body/div[3]/div[1]/div/div[2]/div[2]/div[2]/div/div[3]/div[2]/div[3]/div[2]/div/div/table/tbody/tr[4]/td[2]/div

案例价格的相对 xpath(返回无):

//*[@id="prices-table"]/div/table/tbody/tr[4]/td[2]/div

很遗憾,我无法链接实际网页,因为它需要登录凭据。感谢您的任何/所有帮助。

【问题讨论】:

    标签: python html selenium xpath


    【解决方案1】:

    两种方法。

    1. 如果一切都一样,标记它们的属性,然后使用 xpath 索引。

      //td[text()='Price per bottle']/following-sibling::td[@class='best-bottle-mid']
      

    这代表两个节点,使用find_element 将只适用于您不想要的第一次出现。所以你可以这样做:

    (//td[text()='Price per bottle']/following-sibling::td[@class='best-bottle-mid'])[2]
    

    定位第二个网络元素。同样,您可以为Price per caseCost per ounce

    1. 另一种方法是使用find_elements

      price_per_bottle_elements  = driver.find_elements(By.XPATH, "//td[text()='Price per bottle']/following-sibling::td[@class='best-bottle-mid']")
      
      print(price_per_bottle_elements[0].text) # this we do not want. 
      
      print(price_per_bottle_elements[1].text) # this we want. 
      

    【讨论】:

      猜你喜欢
      • 2016-03-12
      • 1970-01-01
      • 2015-11-10
      • 2020-11-29
      • 1970-01-01
      • 2021-10-11
      • 1970-01-01
      • 1970-01-01
      • 2016-10-31
      相关资源
      最近更新 更多