其实网页是通过JavaScript渲染的
这里是Selenium 方法:
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get('https://spotlightstockmarket.com/sv/market-overview/nyheter/')
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
for item in soup.findAll('a', {'class': 'text'}):
item = item.get("href")
print(f"https://spotlightstockmarket.com{item}")
driver.quit()
输出:
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54904&publisher=370
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54902&publisher=370
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54903&publisher=370
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54901&publisher=370
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54900&publisher=370
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54899&publisher=370
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54898&publisher=370
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54897&publisher=370
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54896&publisher=370
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54894&publisher=370
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=26715&publisher=371
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=26714&publisher=371
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=26713&publisher=371
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=1880&publisher=372
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=1879&publisher=372
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=26712&publisher=371
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=26711&publisher=371
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=26710&publisher=371
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=26709&publisher=371
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=26708&publisher=371
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54808&publisher=369
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54809&publisher=369
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54790&publisher=369
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54776&publisher=369
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54747&publisher=369
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54741&publisher=369
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54721&publisher=369
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54720&publisher=369
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54707&publisher=369
https://spotlightstockmarket.com/sv/market-overview/nyheter/nyhets-artikel/?id=54706&publisher=369
关于li 它不是用JavaScript 渲染的,所以你可以使用:
from bs4 import BeautifulSoup
import requests
r = requests.get(
"https://spotlightstockmarket.com/sv/market-overview/nyheter/")
soup = BeautifulSoup(r.text, 'html.parser')
urls = set()
for item in soup.find_all(lambda tag: tag.name == 'li' and not tag.attrs):
for href in item.findAll("a"):
href = href.get("href")
if href:
href = f"https://spotlightstockmarket.com{href}"
urls.add(href)
print(urls)
输出:
{'https://spotlightstockmarket.com/sv/om-spotlight/kontakt', 'https://spotlightstockmarket.com/sv/market-overview/rapportkalender', 'https://spotlightstockmarket.com/sv/redan-noterad/next', 'https://spotlightstockmarket.com/sv/bli-delaegare', 'https://spotlightstockmarket.com/sv/om-spotlight', 'https://spotlightstockmarket.com/sv/redan-noterad/regelverk', 'https://spotlightstockmarket.com/sv/medlemmar/medlemslista', 'https://spotlightstockmarket.com/sv/redan-noterad/i-fokus', 'https://spotlightstockmarket.com/sv/redan-noterad/information-foer-att-uppraetta-din-ir-sida', 'https://spotlightstockmarket.com/sv/redan-noterad/kapitalanskaffning', 'https://spotlightstockmarket.com/sv/market-overview/nyheter', 'https://spotlightstockmarket.com/sv/market-overview/kurser', 'https://spotlightstockmarket.com/sv/market-overview/bolagshaendelser', 'https://spotlightstockmarket.com/sv/market-overview', 'https://spotlightstockmarket.com/sv/market-overview/vaara-bolag', 'https://spotlightstockmarket.com/sv/redan-noterad/investor-relations', 'https://spotlightstockmarket.com/sv/market-overview/filmer', 'https://spotlightstockmarket.com/sv/om-spotlight/koncerninformation', 'https://spotlightstockmarket.com/en/market-overview/news', 'https://spotlightstockmarket.com/sv/bli-delaegare/hur-blir-jag-delaegare', 'https://spotlightstockmarket.com/sv/om-spotlight/oeppettider', 'https://spotlightstockmarket.com/sv/bli-noterad/go-public', 'https://spotlightstockmarket.com/sv/redan-noterad/disciplinnaemnden', 'https://spotlightstockmarket.com/sv/market-overview/noteringar', 'https://spotlightstockmarket.com/sv/medlemmar/regelverk-och-prislista', 'https://spotlightstockmarket.com/sv/redan-noterad', 'https://spotlightstockmarket.com/sv/bli-noterad/vaart-erbjudande', 'https://spotlightstockmarket.com/sv/redan-noterad/vaart-erbjudande', 'https://spotlightstockmarket.com/sv/market-overview/analyser', 'https://spotlightstockmarket.com/sv/bli-noterad', 'https://spotlightstockmarket.com/sv/bli-noterad/hur-gaar-en-notering-till', 'https://spotlightstockmarket.com/sv/redan-noterad/vaegledning', 'https://spotlightstockmarket.com/sv/redan-noterad/boka-utbildning', 'https://spotlightstockmarket.com/sv/bli-noterad/spotlight-stories', 'https://spotlightstockmarket.com/sv/om-spotlight/pressbilder', 'https://spotlightstockmarket.com/sv/bli-noterad/varfoer-bli-noterad', 'https://spotlightstockmarket.com/sv/medlemmar', 'https://spotlightstockmarket.com/dk/market-overview/nyheder', 'https://spotlightstockmarket.com/sv/market-overview/spotlight-index', 'https://spotlightstockmarket.com/sv/bli-delaegare/varfoer-bli-delaegare', 'https://spotlightstockmarket.com/sv/market-overview/emissioner'}