【问题标题】:how to get text value of title attribute of div tag in python?如何在python中获取div标签的title属性的文本值?
【发布时间】:2020-06-24 00:24:09
【问题描述】:

我正在尝试使用 python scrapy 从sofifa.com 中提取玩家姓名,但我得到的只是空列表

code

<div class="bp3-text-overflow-ellipsis"><img title="Argentina" alt="" src="https://cdn.sofifa.com/flags/ar.png" data-src="https://cdn.sofifa.com/flags/ar.png" data-srcset="https://cdn.sofifa.com/flags/ar@2x.png 2x, https://cdn.sofifa.com/flags/ar@3x.png 3x" class="flag loaded" srcset="https://cdn.sofifa.com/flags/ar@2x.png 2x, https://cdn.sofifa.com/flags/ar@3x.png 3x" data-was-processed="true"> L. Messi</div>

这是我的代码:

response.css('table.table>tbody>tr>td.col-name>div.bp3-text-overflow-ellipsis>a::attr(title)')

【问题讨论】:

    标签: python web-scraping scrapy


    【解决方案1】:

    您可以使用以下代码提取所有团队成员的姓名

    response.css(".list tr td.col-name a::attr(data-tooltip)").extract()
    

    【讨论】:

      【解决方案2】:

      您也可以为此使用不同的库,例如 selenium。很抱歉,我想不出任何与 scrappy 相关的内容,但希望这也能对您有所帮助。

      from selenium import webdriver
      
      import time
      import random
      
      link = "https://sofifa.com/players?col=oa&sort=desc&showCol%5B0%5D=pi&showCol%5B1%5D=ae&showCol%5B2%5D=hi&showCol%5B3%5D=wi&showCol%5B4%5D=pf&showCol%5B5%5D=oa&showCol%5B6%5D=pt&showCol%5B7%5D=bo&showCol%5B8%5D=bp&showCol%5B9%5D=gu&showCol%5B10%5D=jt&showCol%5B11%5D=le&showCol%5B12%5D=vl&showCol%5B13%5D=wg&showCol%5B14%5D=rc&showCol%5B15%5D=ta&showCol%5B16%5D=cr&showCol%5B17%5D=fi&showCol%5B18%5D=he&showCol%5B19%5D=sh&showCol%5B20%5D=vo&showCol%5B21%5D=ts&showCol%5B22%5D=dr&showCol%5B23%5D=cu&showCol%5B24%5D=fr&showCol%5B25%5D=lo&showCol%5B26%5D=bl&showCol%5B27%5D=to&showCol%5B28%5D=ac&showCol%5B29%5D=sp&showCol%5B30%5D=ag&showCol%5B31%5D=re&showCol%5B32%5D=ba&showCol%5B33%5D=tp&showCol%5B34%5D=so&showCol%5B35%5D=ju&showCol%5B36%5D=st&showCol%5B37%5D=sr&showCol%5B38%5D=ln&showCol%5B39%5D=te&showCol%5B40%5D=ar&showCol%5B41%5D=in&showCol%5B42%5D=po&showCol%5B43%5D=vi&showCol%5B44%5D=pe&showCol%5B45%5D=cm&showCol%5B46%5D=td&showCol%5B47%5D=ma&showCol%5B48%5D=sa&showCol%5B49%5D=sl&showCol%5B50%5D=tg&showCol%5B51%5D=gd&showCol%5B52%5D=gh&showCol%5B53%5D=gk&showCol%5B54%5D=gp&showCol%5B55%5D=gr&showCol%5B56%5D=tt&showCol%5B57%5D=bs&showCol%5B58%5D=wk&showCol%5B59%5D=sk&showCol%5B60%5D=aw&showCol%5B61%5D=dw&showCol%5B62%5D=ir&showCol%5B63%5D=pac&showCol%5B64%5D=sho&showCol%5B65%5D=pas&showCol%5B66%5D=dri&showCol%5B67%5D=def&showCol%5B68%5D=phy&offset=0"
      PATH_TO_CHROMEDRIVER ="/Users/Povilas/Documents/GitHub/chromedriver"
      
      driver = webdriver.Chrome(PATH_TO_CHROMEDRIVER)
      driver.get(link)
      time.sleep((random.randint(30, 50))/10)
      all_names = driver.find_elements_by_xpath('/html/body/div[1]/div/div/div[1]/table/tbody/tr/td[2]/a[1]/div')
      for name in all_names:
          print(name.text)
      

      【讨论】:

      • 他在获取选择器时遇到了问题。 Scrapy 也有 xpath 方法,所以使用哪个库没有区别。
      猜你喜欢
      • 2020-11-27
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-12-06
      • 2021-11-10
      • 1970-01-01
      相关资源
      最近更新 更多