Python抓取网页内容

import urllib
import re
def getHtml(url):
    page=urllib.urlopen(url)
    html=page.read()
    return html
html= getHtml("http://tieba.baidu.com/p/2460150866")
print \'Size is:\',len(html)
f=file(\'a.html\',\'w\')
f.write(html)
f.close()

Python的urllib模块还是很好用的,顺便把抓到的网页内容写到a.html里,然后模式匹配各个html标签,想得到什么都不是问题啦~~~

相关资源

基于JAVA技术的网页内容智能抓取 WORD版下载 2022-12-26
python网络爬虫(抓取网页的含义和URL基本构成) 中文PDF版 4.25MB下载 2023-01-28
BIGACE CMS 网页内容管理系统 php版 v3.0下载 2023-03-07
倾城Asp.net数据抓取源码 v1.0下载 2022-12-17

相似解决方案

热门标签

Java Python linux javascript Mysql C# Docker 算法前端 SpringBoot Redis Vue spring 设计模式 .net core .net kubernetes c++ 数据库数据结构大数据 js 机器学习微服务 Android Go 程序员面试 JVM ASP.net core 云原生人工智能后端 PHP git CSS golang k8s Nginx Django mybatis 深度学习多线程 React 架构 devops 爬虫云计算 Spring Boot LeetCode