【问题标题】:Trigger data response from .aspx page从 .aspx 页面触发数据响应
【发布时间】:2018-01-25 02:22:40
【问题描述】:
from bs4 import BeautifulSoup
from pprint import pprint
import requests

url = 'http://estadistico.ut.com.sv/OperacionDiaria.aspx'

s = requests.Session()

pagereq = s.get(url)
soup = BeautifulSoup(pagereq.content, 'lxml')

viewstategenerator = soup.find("input", attrs = {'id': '__VIEWSTATEGENERATOR'})['value']
viewstate = soup.find("input", attrs = {'id': '__VIEWSTATE'})['value']
eventvalidation = soup.find("input", attrs = {'id': '__EVENTVALIDATION'})['value']

eventtarget = 'ASPxDashboardViewer1'
DXCss = '1_33,1_4,1_9,1_5,15_2,15_4'
DXScript = '1_232,1_134,1_225,1_169,1_187,15_1,1_183,1_182,1_140,1_147,1_148,1_142,1_141,1_143,1_144,1_145,1_146,15_0,15_6,15_7'
eventargument = {"Task":"Export","ExportInfo":{"Mode":"SingleItem","GroupName":"pivotDashboardItem1","FileName":"Generación+por+tipo+de+tecnología+(MWh)","ClientState":{"clientSize":{"width":509,"height":385},"titleHeight":48,"itemsState":[{"name":"pivotDashboardItem1","headerHeight":34,"position":{"left":11,"top":146},"width":227,"height":108,"virtualSize":'null',"scroll":{"horizontal":'true',"vertical":'true'}}]},"Format":"Excel","DocumentOptions":{"paperKind":"Letter","pageLayout":"Portrait","scaleMode":"AutoFitWithinOnePage","scaleFactor":1,"autoFitPageCount":1,"showTitle":'true',"title":"Operación+Diaria","imageFormatOptions":{"format":"Png","resolution":96},"excelFormatOptions":{"format":"Csv","csvValueSeparator":","},"commonOptions":{"filterStatePresentation":"None","includeCaption":'true',"caption":"Generación+por+tipo+de+tecnología+(MWh)"},"pivotOptions":{"printHeadersOnEveryPage":'true'},"gridOptions":{"fitToPageWidth":'true',"printHeadersOnEveryPage":'true'},"chartOptions":{"automaticPageLayout":'true',"sizeMode":"Zoom"},"pieOptions":{"autoArrangeContent":'true'},"gaugeOptions":{"autoArrangeContent":'true'},"cardOptions":{"autoArrangeContent":'true'},"mapOptions":{"automaticPageLayout":'true',"sizeMode":"Zoom"},"rangeFilterOptions":{"automaticPageLayout":'true',"sizeMode":"Stretch"},"imageOptions":{},"fileName":"Generación+por+tipo+de+tecnología+(MWh)"},"ItemType":"PIVOT"},"Context":"BwAHAAIkY2NkNWRiYzItYzIwNS00MDIyLTkzZjUtYWQ0NzVhYTM5Y2E3Ag9PcGVyYWNpb25EaWFyaWECAAIAAAAAAMByQA==","RequestMarker":1,"ClientState":{}}

postdata = {'__EVENTTARGET': eventtarget,
            '__EVENTARGUMENT': eventargument,
            '__VIEWSTATE': viewstate,
            '__VIEWSTATEGENERATOR': viewstategenerator,
            '__EVENTVALIDATION': eventvalidation,
            'DXScript': DXScript,
            'DXCss': DXCss
           }

datareq = s.post(url, data = postdata)

print datareq.text

我正在尝试从this .aspx 网页中抓取数据。该页面通过 javascript 动态加载数据,因此无法直接使用 requests/BeautifulSoup 进行抓取。

通过查看网络流量,我可以看到,当您单击元素的导出 (Exportar a) 按钮时,选择一种导出类型(excel、csv),然后确认向页面发出 POST 请求。它返回我需要的数据的 base64 编码字符串。据我所知,没有办法直接对文件发出 GET 请求,因为它仅在请求时生成。

我想要做的是复制触发 csv 响应的 POST 请求。因此,我首先搜索 __VIEWSTATE、__VIEWSTATEGENERATOR 和 __EVENTVALIDATION。 __EVENTTARGET、DXCSS 和 DXScript 看起来已修复。 __EVENTARGUMENT 直接从 POST 请求中复制。

我的代码返回服务器应用程序错误。我认为问题要么是a)错误的__EVENTARGUMENT(可能是部分动态而不是固定的?),b)没有真正理解.aspx页面的工作原理,或者c)这些工具无法实现我想要做的事情。

我确实考虑过使用 selenium 来触发数据导出,但我看不到捕获服务器响应的方法。

【问题讨论】:

    标签: python asp.net web-scraping


    【解决方案1】:

    我能够从比我更了解 aspx 页面的人那里获得帮助。

    链接到提供解决方案的 Github gist。

    https://gist.github.com/jarek/d73c672d8dd4ddb48d80bffc4d8038ba

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2014-11-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-09
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多