如何在python中获取div标签的title属性的文本值？答案

【问题标题】：how to get text value of title attribute of div tag in python?如何在python中获取div标签的title属性的文本值？
【发布时间】：2020-06-24 00:24:09
【问题描述】：

我正在尝试使用 python scrapy 从sofifa.com 中提取玩家姓名，但我得到的只是空列表

code

<div class="bp3-text-overflow-ellipsis"><img title="Argentina" alt="" src="https://cdn.sofifa.com/flags/ar.png" data-src="https://cdn.sofifa.com/flags/ar.png" data-srcset="https://cdn.sofifa.com/flags/ar@2x.png 2x, https://cdn.sofifa.com/flags/ar@3x.png 3x" class="flag loaded" srcset="https://cdn.sofifa.com/flags/ar@2x.png 2x, https://cdn.sofifa.com/flags/ar@3x.png 3x" data-was-processed="true"> L. Messi</div>

这是我的代码：

response.css('table.table>tbody>tr>td.col-name>div.bp3-text-overflow-ellipsis>a::attr(title)')

【问题讨论】：

标签： python web-scraping scrapy

【解决方案1】：

您可以使用以下代码提取所有团队成员的姓名

response.css(".list tr td.col-name a::attr(data-tooltip)").extract()

【讨论】：

【解决方案2】：

您也可以为此使用不同的库，例如 selenium。很抱歉，我想不出任何与 scrappy 相关的内容，但希望这也能对您有所帮助。

from selenium import webdriver

import time
import random

link = "https://sofifa.com/players?col=oa&sort=desc&showCol%5B0%5D=pi&showCol%5B1%5D=ae&showCol%5B2%5D=hi&showCol%5B3%5D=wi&showCol%5B4%5D=pf&showCol%5B5%5D=oa&showCol%5B6%5D=pt&showCol%5B7%5D=bo&showCol%5B8%5D=bp&showCol%5B9%5D=gu&showCol%5B10%5D=jt&showCol%5B11%5D=le&showCol%5B12%5D=vl&showCol%5B13%5D=wg&showCol%5B14%5D=rc&showCol%5B15%5D=ta&showCol%5B16%5D=cr&showCol%5B17%5D=fi&showCol%5B18%5D=he&showCol%5B19%5D=sh&showCol%5B20%5D=vo&showCol%5B21%5D=ts&showCol%5B22%5D=dr&showCol%5B23%5D=cu&showCol%5B24%5D=fr&showCol%5B25%5D=lo&showCol%5B26%5D=bl&showCol%5B27%5D=to&showCol%5B28%5D=ac&showCol%5B29%5D=sp&showCol%5B30%5D=ag&showCol%5B31%5D=re&showCol%5B32%5D=ba&showCol%5B33%5D=tp&showCol%5B34%5D=so&showCol%5B35%5D=ju&showCol%5B36%5D=st&showCol%5B37%5D=sr&showCol%5B38%5D=ln&showCol%5B39%5D=te&showCol%5B40%5D=ar&showCol%5B41%5D=in&showCol%5B42%5D=po&showCol%5B43%5D=vi&showCol%5B44%5D=pe&showCol%5B45%5D=cm&showCol%5B46%5D=td&showCol%5B47%5D=ma&showCol%5B48%5D=sa&showCol%5B49%5D=sl&showCol%5B50%5D=tg&showCol%5B51%5D=gd&showCol%5B52%5D=gh&showCol%5B53%5D=gk&showCol%5B54%5D=gp&showCol%5B55%5D=gr&showCol%5B56%5D=tt&showCol%5B57%5D=bs&showCol%5B58%5D=wk&showCol%5B59%5D=sk&showCol%5B60%5D=aw&showCol%5B61%5D=dw&showCol%5B62%5D=ir&showCol%5B63%5D=pac&showCol%5B64%5D=sho&showCol%5B65%5D=pas&showCol%5B66%5D=dri&showCol%5B67%5D=def&showCol%5B68%5D=phy&offset=0"
PATH_TO_CHROMEDRIVER ="/Users/Povilas/Documents/GitHub/chromedriver"

driver = webdriver.Chrome(PATH_TO_CHROMEDRIVER)
driver.get(link)
time.sleep((random.randint(30, 50))/10)
all_names = driver.find_elements_by_xpath('/html/body/div[1]/div/div/div[1]/table/tbody/tr/td[2]/a[1]/div')
for name in all_names:
    print(name.text)

【讨论】：

他在获取选择器时遇到了问题。 Scrapy 也有 xpath 方法，所以使用哪个库没有区别。