1、使用库:request、BeautifulSoup

2、request

esponse =requests.get(
url='https://www.autohome.com.cn/news/'
)

response.encoding = response.apparent_encoding
response.text
response.content
response.status_code

3、BeautifulSoup
转换成soup对象
soup = BeautifulSoup(response.text,features='html.parser') #默认用html.parser,生产用lxml,性能更好
根据id查找
soup.find()
查找li、div、img等html标签下的文本
target = soup.find().find_all('li') # 找到所有li


4、简单示例
import requests
from bs4 import BeautifulSoup

response =requests.get(
url='https://www.autohome.com.cn/news/'
)
response.encoding = response.apparent_encoding
print(response.status_code)
soup = BeautifulSoup(response.text,features='html.parser') #默认用html.parser,生产用lxml,性能更好

#正则查找
target = soup.find().find_all('li') # 找到所有li

for li in li_list:
a = li.find('a') #找a标签
if(a):
pass
print(a.attrs)
print(a.attrs.get('href'))
     
  
  img = li.find('img').get('src')
  res = requests.get(img)
  file_name = "%s.jpg" %(title,)
  with open(file_name,'wb') as f:
  f.write(res.content)


相关文章:

  • 2022-02-17
  • 2021-09-13
  • 2022-02-09
  • 2021-06-21
  • 2021-10-14
  • 2021-12-27
  • 2021-09-27
猜你喜欢
  • 2022-12-23
  • 2021-06-25
  • 2021-12-08
  • 2021-05-31
  • 2021-04-10
  • 2022-12-23
  • 2021-05-18
相关资源
相似解决方案