【发布时间】:2020-04-13 11:37:30
【问题描述】:
我想从South African National Lottery 网站上抓取南非 LOTTO 抽奖的历史结果(尤其是总彩池大小、总销售额等)。默认情况下,人们会看到最近十次抽奖结果的链接,或者可以选择一个日期范围来拉出一组更大的抽奖链接(每页仍然只显示十个)。
在浏览器中悬停在链接上,例如'LOTTO DRAW 2012' 我们看到javascript:void();,所以很明显抽奖结果将使用 Javascript 呈现。阅读R Web Scraping Cheat Sheet 上的建议后,我意识到我需要打开 Google Chrome 开发者工具,然后打开网络选项卡,然后单击指向“LOTTO DRAW 2012”的链接。当我这样做时,我可以看到 this url 正在被 initiator 调用
当我右键单击启动器并选择“复制响应”时,我可以在看起来是 JSON 代码的“drawDetails”对象中看到我需要的数据。
{"code":200,"message":"OK","data":{"drawDetails":{"drawNumber":"2012","drawDate":"2020\/04\/11","nextDrawDate":"2020\/04\/15","ball1":"48","ball2":"6","ball3":"43","ball4":"41","ball5":"25","ball6":"45","bonusBall":"38","div1Winners":"1","div1Payout":"10546013.8","div2Winners":"0","div2Payout":"0","div3Winners":"28","div3Payout":"7676.4","div4Winners":"62","div4Payout":"2751.4","div5Winners":"1389","div5Payout":"206.3","div6Winners":"1872","div6Payout":"133","div7Winners":"28003","div7Payout":"50","div8Winners":"20651","div8Payout":"20","rolloverAmount":"0","rolloverNumber":"0","totalPrizePool":"13280236.5","totalSales":"11610950","estimatedJackpot":"2000000","guaranteedJackpot":"0","drawMachine":"RNG2","ballSet":"RNG","status":"published","winners":52006,"millionairs":1,"gpwinners":"52006","wcwinners":"0","ncwinners":"0","ecwinners":"0","mpwinners":"0","lpwinners":"0","fswinners":"0","kznwinners":"0","nwwinners":"0"},"totalWinnerRecord":{"lottoMillionairs":28716702,"lottoWinners":337285646,"ithubaMillionairs":135763,"ithubaWinners":305615802}},"videoData":[{"id":"1049","listid":"1","parentid":"1","videosource":"youtube","videoid":"chHfFxVi9QI","imageurl":"","title":"LOTTO, LOTTO PLUS 1 AND LOTTO PLUS 2 DRAW 2012 (11 APRIL 2020)","description":"","custom_imageurl":"","custom_title":"","custom_description":"","specialparams":"","lastupdate":"0000-00-00 00:00:00","allowupdates":"1","status":"0","isvideo":"1","link":"https:\/\/www.youtube.com\/watch?v=chHfFxVi9QI","ordering":"10001","publisheddate":"2020-04-11 20:06:17","duration":"182","rating_average":"0","rating_max":"0","rating_min":"0","rating_numRaters":"0","statistics_favoriteCount":"0","statistics_viewCount":"329","keywords":"","startsecond":"0","endsecond":"0","likes":"6","dislikes":"0","commentcount":"0","channel_username":"","channel_title":"","channel_subscribers":"9880","channel_subscribed":"0","channel_location":"","channel_commentcount":"0","channel_viewcount":"0","channel_videocount":"1061","channel_description":"","channel_totaluploadviews":"0","alias":"lotto-lotto-plus-1-and-lotto-plus-2-draw-2012-11-april-2020","rawdata":"","datalink":"https:\/\/www.googleapis.com\/youtube\/v3\/videos?id=chHfFxVi9QI&part=id,snippet,contentDetails,statistics&key=AIzaSyC1Xvk2GUdb_N3UiFtjsgZ-uMviJ_8MFZI"}]}
这是一个POST类型的请求,所以我尝试关注this answer,但找不到onclick的值,表示与表单一起提交的数据。此外,“LOTTO DRAW 2012”的请求 URL 与“LOTTO DRAW 2011”的请求 URL 相同,因此与 URL 本身一起传递的特定抽奖没有唯一标识符。因此,我不清楚对特定抽奖结果的独特要求是如何提出的。
因此,较小的问题是,给定特定的 LOTTO 抽奖号码或抽奖日期,如何找出用于针对与该抽奖有关的数据发出 POST 请求的唯一标识符?
更大的问题是,如果能够获得所有历史抽奖的唯一标识符,如何依次为所有历史抽奖生成JSON drawDetails对象,否则完成抓取操作?
【问题讨论】:
-
点击您对该侧面板感兴趣的特定请求。然后单击
Headers并向下滚动。看看有没有Query Form之类的。 -
存在
Form Data,其值为gameName和drawNumber;这些一起将唯一标识平局。谢谢 - 所以这回答了第一个问题。进一步的问题是如何在 R 中为给定的drawNumber值运行该请求,以生成 JSON drawDetails 对象。
标签: javascript r web-scraping