有没有一种特殊的方法来抓取动态网站？答案

【问题标题】：Is there a particular way to scrape a dynamic website?有没有一种特殊的方法来抓取动态网站？
【发布时间】：2020-05-19 12:12:33
【问题描述】：

我正在尝试抓取the following webpage。

搜索框（显示输入安全名称/代码/ID）是我遇到困难的地方。我无法用 xpath 刮掉它，我正在使用 mechanize 库进行浏览器模拟，但它似乎不起作用。

我遇到了这个问题Excel VBA Scrape Web Page，这与我的问题非常相似，但我不知道如何使用 Python 来实现它。

我试过的代码：

import mechanize
from bs4 import BeautifulSoup
import requests
url = 'https://www.bseindia.com/corporates/corporate_act.aspx'
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")
br = mechanize.Browser()
br.open(url)
br.select_form(nr=0)
br['ContentPlaceHolder1_SmartSearch_smartSearch'] = input("enter the company name")
response = br.submit()

P.S.：我是网络抓取的新手，我想知道大学项目的解决方案，非常感谢您的帮助。

【问题讨论】：

检查开发人员工具中的网络选项卡 - 每当您键入时，搜索字段都会向 api.bseindia.com/BseIndiaAPI/api/PeerSmartSearch/… 发出请求。您可以简单地直接抓取该端点。

标签： python web-scraping beautifulsoup scrapy mechanize

【解决方案1】：

我建议你使用scrapy 框架。我认为它是更先进的工具之一，包括模拟与网站交互的工具。

【讨论】：