【问题标题】:How can I change my code to get URL link from the HTML code?如何更改我的代码以从 HTML 代码中获取 URL 链接?
【发布时间】:2019-04-07 02:02:36
【问题描述】:

我尝试使用beautifulsoup4 scrape在python中的HTML代码的URL,但是我得到了这样的错误:AttributeError: 'NoneType' object has no attribute 'get'

HTML 代码:

<a class="top NQHJEb dfhHve" href="https://globalnews.ca/news/5137005/donald-trump-robert-mueller-report/" ping="/url?sa=t&source=web&rct=j&url=https://globalnews.ca/news/5137005/donald-trump-robert-mueller-report/&ved=0ahUKEwiS9pn-4rzhAhWOyIMKHSOPD6QQvIgBCDcwAg"><img class="th BbeB2d" src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQ_Nf-kVlqsQz8NeNgQ9a9YRiA7Fl4DJ6Jod0sxNXapOK_iJebx20dgROk5YBl8IqFQX6S-eeY2" alt="Story image for trump from Globalnews.ca" onload="typeof google==='object'&&google.aft&&google.aft(this)" data-iml="1554598687532" data-atf="3"></a>

我的python代码:

URL_results = soup.find_all('a', class_= 'top NQHJEb dfhHve').get('href')

【问题讨论】:

    标签: python-3.x web-scraping beautifulsoup web-crawler


    【解决方案1】:

    您正在将该方法应用于列表。相反,您想应用于每个元素

    URL_results = [a.attrs.get('href') for a in soup.find_all('a', class_= 'top NQHJEb dfhHve')]
    

    我更喜欢

    URL_results = [item['href'] for item in soup.select('a.top.NQHJEb.dfhHve')]
    

    你也许可以从当前的复合类选择器中删除一些类,例如

    URL_results = [item['href'] for item in soup.select('a.dfhHve')]
    

    你需要到处玩看看。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2011-01-26
      • 2019-11-12
      • 1970-01-01
      • 2011-07-14
      • 1970-01-01
      • 2015-01-30
      • 1970-01-01
      相关资源
      最近更新 更多