wulilichao

Python for cyber novel web crawler

Package in use

urllib, BeautifulSoup
urllib is a build-in package in Python and the most useful childpackage is .request.urlopen.
BeautifulSoup could be installed through Anaconda by yourself, and it could compel the .html webpage as an object.

Example

html = urlopen("http://www.shuhai.com/read/54351/1.html")
bsObj = BeautifulSoup(html)
chapter_content = bsObj.findAll("p")
for content in chapter_content:
    print(content.get_text())

Extend

Use of the bsObj to check the construction of html body.
Use of .get_text() to return the text content in the object.
Use of .findAll()

分类:

技术点:

相关文章:

  • 2021-06-17
  • 2021-04-05
  • 2021-09-18
  • 2021-07-10
  • 2022-02-03
  • 2021-12-05
  • 2021-12-05
猜你喜欢
  • 2022-01-01
  • 2021-06-25
  • 2021-12-20
  • 2022-12-23
  • 2021-06-14
  • 2021-08-09
相关资源
相似解决方案