如何在python中抓取具有静态url的网站答案

【问题标题】：how to crawl website having static url in python如何在python中抓取具有静态url的网站
【发布时间】：2017-06-30 10:09:32
【问题描述】：

我想抓取一个基于 PHP 的网站，它有一个搜索框，我们可以在该搜索框中输入一个数字，当我们单击提交按钮或按 Enter 但 URL 没有改变时，它会根据输入的数字呈现结果。就像它为每个结果显示 foo.com/res_17.php 一样，但对于像上千条记录一样爬行，记录应该可以通过唯一 ID 访问，例如 foo.com/res_17.php?id=1001, foo.com/res_17.php ?id=1002 - foo.com/res_17.php?id=3450 这样我就可以使用 while 循环访问它们我该如何做到这一点任何解决方案请帮忙。

【问题讨论】：

你有什么问题？
fbise.edu.pk/res-ssc-II.php on this website results for roll# 100001-143293 are available 如何抓取它们...？

标签： php python web-crawler dynamic-url

【解决方案1】：

我给了你一个我的剧本

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://en.wikipedia.org/wiki/Andrew_Ng")
bsObj = BeautifulSoup(html)

for link in bsObj.find("div", {"id":"bodyContent"}).findAll("a",
            href=re.compile("^(/wiki/)((?!:).)*$")):
    if 'href' in link.attrs:
        print(link.attrs['href'])

输出显示为 Andrew Ng 维基百科的所有文章。

【讨论】：