【发布时间】:2021-11-20 23:26:31
【问题描述】:
所以我正在使用 selenium 从一个酒类销售网站上抓取内容,以便更快地将产品详细信息添加到电子表格中。我正在使用 selenium 登录网站并搜索正确的产品。一旦我进入产品页面,我就可以抓取我需要的所有数据,除了包含在某个代码块中的一些数据。
我需要 3 条数据:每箱价格、每瓶价格和每盎司价格。我在代码中注意到我正在寻找的数据以类似的模式出现两次。有趣的是,我想要的正确数据是数据的第二次出现(第一次出现不正确)。相关的HTML代码是:
<h2>Pricing</h2>
<div id="prices-table">
<div class="table-responsive">
<table class="table table-condensed auto-width">
<thead>
<tr>
<th></th>
<th class="best-bottle-top">
Frontline
</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bottles</td>
<td class="best-bottle-mid">1</td>
</tr>
<tr>
<td>Cases</td>
<td class="best-bottle-mid">—</td>
</tr>
<tr>
<td>Price per bottle</td>
<td class="best-bottle-mid">
<div>$16.14 #I don't want this data </div>
</td>
</tr>
<tr>
<td>Price per case</td>
<td class="best-bottle-mid">
<div>
$193.71 #I don't want this data
</div>
</td>
</tr>
<tr>
<td>Cost per ounce</td>
<td class="best-bottle-mid">
<div>$1.27 #I don't want this data </div>
</td>
</tr>
<tr>
<td></td>
<td class="best-bottle-bot text-muted">
<span class="best-bottle-bot-content">
<span>
<div><small>Best</small></div>
<small>Bottle</small>
</span>
</span>
</td>
</tr>
</tbody>
</table>
</div>
<p>
<em class="price-disclaimer">Defer to Athens Distributing Company of Tennessee in case of any price discrepancies.</em>
</p>
</div>
</div>
<hr class="visible-print-block">
<div class="tab-pane active" id="3400355">
<dl class="dl-horizontal vpv-row">
<dt>Sizing</dt><dd>750 mL bottle × 6</dd>
<dt>SKU</dt><dd>80914</dd>
<dt>UPC</dt><dd>853192006189</dd>
<dt>Status</dt><dd>Active</dd>
<dt>Availability</dt><dd>
<span class="label label-success inventory-status-badge"><span data-container="body" data-toggle="popover" data-placement="top" data-content="Athens Distributing Company of Tennessee is integrated with SevenFifty and sends inventory levels at least once a day. You can order this item and expect that it is available." data-original-title="" title="">IN STOCK</span></span>
</dd></dl>
<div id="prices-table">
<div class="table-responsive">
<table class="table table-condensed auto-width">
<h2>Pricing</h2><thead>
<tr>
<th></th>
<th class="best-bottle-top">
Frontline
</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bottles</td>
<td class="best-bottle-mid">1</td>
</tr>
<tr>
<td>Cases</td>
<td class="best-bottle-mid">—</td>
</tr>
<tr>
<td>Price per bottle</td>
<td class="best-bottle-mid">
<div>$33.03 #I want THIS data </div>
</td>
</tr>
<tr>
<td>Price per case</td>
<td class="best-bottle-mid">
<div>
$198.18 I want THIS data
</div>
</td>
</tr>
<tr>
<td>Cost per ounce</td>
<td class="best-bottle-mid">
<div>$1.30 I want THIS data </div>
</td>
</tr>
<tr>
<td></td>
<td class="best-bottle-bot text-muted">
<span class="best-bottle-bot-content">
<span>
<div><small>Best</small></div>
<small>Bottle</small>
</span>
</span>
</td>
</tr>
</tbody>
</table>
</div>
使用完整的 xpath chrome 可以找到我想要的,但尝试相对路径不起作用。这是我尝试过的:
案例价格的完整 xpath(有效但不想使用绝对引用):
/html/body/div[3]/div[1]/div/div[2]/div[2]/div[2]/div/div[3]/div[2]/div[3]/div[2]/div/div/table/tbody/tr[4]/td[2]/div
案例价格的相对 xpath(返回无):
//*[@id="prices-table"]/div/table/tbody/tr[4]/td[2]/div
很遗憾,我无法链接实际网页,因为它需要登录凭据。感谢您的任何/所有帮助。
【问题讨论】:
标签: python html selenium xpath