【问题标题】:Python BeautifulSoup not scraping this urlPython BeautifulSoup 没有抓取这个网址
【发布时间】:2016-12-07 00:37:26
【问题描述】:

我正在尝试从 url 中抓取一些玩家数据 (tr) 行,但是当我运行我的代码时似乎没有发生任何事情。我很肯定我的代码很好,因为它可以与其他包含表格的统计网站一起使用。谁能告诉我为什么什么都没发生?提前致谢。

import urllib
import urllib.request
from bs4 import BeautifulSoup

def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata

soup = make_soup("https://www.whoscored.com/Regions/252/Tournaments/7/Seasons/6365/Stages/13832/PlayerStatistics/England-Championship-2016-2017")
for record in soup.findAll('tr'):
    print(record.text)

【问题讨论】:

    标签: python url beautifulsoup


    【解决方案1】:

    简答:您要查找的玩家数据不在该网址中

    那么您可能想问为什么我在那个页面上看到过,怎么没有呢?

    所以我将尝试解释 当您使用 Chrome 等现代浏览器浏览该网址时会发生什么

    您: 输入网址并回车。

    Chrome: 明白了。我会尽快为您获取该页面,请稍等。 (从那个 url 获取内容),太好了,现在我有了它!但是等一下让我 在我展示给你之前先阅读/解析它,(阅读里面的内容 内容),哦,废话,这个javascript告诉我要获得额外的 来自另一个网址的信息,好的,我会这样做;哦,等等,这是另一个 一个告诉我在标题中加载广告,我不喜欢它,但是 我只是按我说的去做;等一下,这些 CSS 告诉我 以粗体显示玩家姓名,还不错;哦,这是另一张照片 url xxx 我需要加载,没问题...哦,伙计,有多少东西 让我处理?我对这个网站不满意......(正在做一个 一堆其他的东西......)终于一切准备就绪!现在就来看看吧!

    你:玩家xxx其实挺好的,我去看看。 (点击播放器xxx)

    铬::......

    正如您在每次浏览网页时看到的那样,浏览器会做很多“幕后”工作来向用户展示它。所以基本上:url输入>>从url获取的内容>>解析的内容>>获取的其他内容>>所有呈现的内容>>页面显示(一个或多个步骤可能同时完成)

    而使用您的代码,它只是“从 url 获取的内容”,而且 您想要的那些统计数据恰好是必须从其他地方加载的“附加内容”,所以这就是您一无所获的原因.

    那我如何获得这些统计数据呢?一旦您知道负责加载这些统计信息的 url,就可以追踪它们。我如何找到这些网址?好吧,你总是可以阅读 javascripts ......如果你有足够的耐心......

    获得所需内容的最简单方法是在该页面加载时分析流量,并找出所有幕后流量。我会推荐fiddler,但您可以使用任何您认为合适的工具。

    现在让我们看看加载该页面时会发生什么:

    实际上有数百个请求来完全呈现您访问的页面,而您需要做的就是找出哪个提供“实际”或“真实”统计信息。即使其中有“StatisticsFeed”,也有这个网址,可能是那个吗?一起来看看吧:

    https://www.whoscored.com/StatisticsFeed/1/GetPlayerStatistics?category=summary&subcategory=all&statsAccumulationType=0&isCurrent=true&playerId=&teamIds=&matchId=&stageId=13832&tournamentOptions=7&sortBy=Rating&sortAscending=&age=&ageComparisonType=&appearances=&appearancesComparisonType=&field=Overall&nationality=&positionOptions=&timeOfTheGameEnd=&timeOfTheGameStart=&isMinApp=true&page=&includeZeroValues=&numberOfPlayersToPick=10

    {
        "playerTableStats": [{
            "name": "Conor Hourihane",
            "firstName": "Conor",
            "lastName": "Hourihane",
            "playerId": 134172,
            "height": 181,
            "weight": 62,
            "age": 25,
            "isManOfTheMatch": false,
            "isActive": true,
            "isOpta": true,
            "playedPositions": "-MC-",
            "positionText": "Midfielder",
            "playedPositionsShort": "M(C)",
            "teamId": 142,
            "teamName": "Barnsley",
            "seasonId": 6365,
            "seasonName": "2016/2017",
            "tournamentId": 7,
            "tournamentRegionId": 252,
            "tournamentRegionCode": "gb-eng",
            "regionCode": "ie",
            "tournamentName": "Championship",
            "tournamentShortName": "EC",
            "rating": 7.8705882352941181,
            "ranking": 1,
            "apps": 17,
            "subOn": 0,
            "minsPlayed": 1530,
            "manOfTheMatch": 4,
            "yellowCard": 5.0,
            "redCard": 0.0,
            "goal": 3,
            "assistTotal": 8,
            "shotsPerGame": 2.2352941176470589,
            "aerialWonPerGame": 0.6470588235294118,
            "passSuccess": 81.370449678800867
        },
        {
            "name": "Anthony Knockaert",
            "firstName": "Anthony",
            "lastName": "Knockaert",
            "playerId": 86794,
            "height": 172,
            "weight": 69,
            "age": 25,
            "isManOfTheMatch": false,
            "isActive": true,
            "isOpta": true,
            "playedPositions": "-AML-AMR-",
            "positionText": "Midfielder",
            "playedPositionsShort": "AM(LR)",
            "teamId": 211,
            "teamName": "Brighton",
            "seasonId": 6365,
            "seasonName": "2016/2017",
            "tournamentId": 7,
            "tournamentRegionId": 252,
            "tournamentRegionCode": "gb-eng",
            "regionCode": "fr",
            "tournamentName": "Championship",
            "tournamentShortName": "EC",
            "rating": 7.6722222222222216,
            "ranking": 2,
            "apps": 18,
            "subOn": 1,
            "minsPlayed": 1471,
            "manOfTheMatch": 5,
            "yellowCard": 4.0,
            "redCard": 0.0,
            "goal": 6,
            "assistTotal": 0,
            "shotsPerGame": 2.3888888888888888,
            "aerialWonPerGame": 0.22222222222222221,
            "passSuccess": 83.420593368237348
        },
        {
            "name": "Lewis Dunk",
            "firstName": "Lewis",
            "lastName": "Dunk",
            "playerId": 86441,
            "height": 192,
            "weight": 88,
            "age": 25,
            "isManOfTheMatch": false,
            "isActive": true,
            "isOpta": true,
            "playedPositions": "-DC-",
            "positionText": "Defender",
            "playedPositionsShort": "D(C)",
            "teamId": 211,
            "teamName": "Brighton",
            "seasonId": 6365,
            "seasonName": "2016/2017",
            "tournamentId": 7,
            "tournamentRegionId": 252,
            "tournamentRegionCode": "gb-eng",
            "regionCode": "gb-eng",
            "tournamentName": "Championship",
            "tournamentShortName": "EC",
            "rating": 7.660000000000001,
            "ranking": 3,
            "apps": 18,
            "subOn": 0,
            "minsPlayed": 1620,
            "manOfTheMatch": 3,
            "yellowCard": 8.0,
            "redCard": 0.0,
            "goal": 1,
            "assistTotal": 1,
            "shotsPerGame": 0.61111111111111116,
            "aerialWonPerGame": 3.5,
            "passSuccess": 79.72251867662753
        },
        {
            "name": "Tom Clarke",
            "firstName": "Tom",
            "lastName": "Clarke",
            "playerId": 133974,
            "height": 180,
            "weight": 77,
            "age": 28,
            "isManOfTheMatch": false,
            "isActive": true,
            "isOpta": true,
            "playedPositions": "-DC-",
            "positionText": "Defender",
            "playedPositionsShort": "D(C)",
            "teamId": 181,
            "teamName": "Preston",
            "seasonId": 6365,
            "seasonName": "2016/2017",
            "tournamentId": 7,
            "tournamentRegionId": 252,
            "tournamentRegionCode": "gb-eng",
            "regionCode": "gb-eng",
            "tournamentName": "Championship",
            "tournamentShortName": "EC",
            "rating": 7.6126315789473677,
            "ranking": 4,
            "apps": 19,
            "subOn": 0,
            "minsPlayed": 1692,
            "manOfTheMatch": 4,
            "yellowCard": 0.0,
            "redCard": 0.0,
            "goal": 2,
            "assistTotal": 0,
            "shotsPerGame": 0.89473684210526316,
            "aerialWonPerGame": 5.4736842105263159,
            "passSuccess": 66.666666666666657
        },
        {
            "name": "Pontus Jansson",
            "firstName": "Pontus",
            "lastName": "Jansson",
            "playerId": 121123,
            "height": 194,
            "weight": 89,
            "age": 25,
            "isManOfTheMatch": false,
            "isActive": true,
            "isOpta": true,
            "playedPositions": "-DC-",
            "positionText": "Defender",
            "playedPositionsShort": "D(C)",
            "teamId": 19,
            "teamName": "Leeds",
            "seasonId": 6365,
            "seasonName": "2016/2017",
            "tournamentId": 7,
            "tournamentRegionId": 252,
            "tournamentRegionCode": "gb-eng",
            "regionCode": "se",
            "tournamentName": "Championship",
            "tournamentShortName": "EC",
            "rating": 7.5976923076923066,
            "ranking": 5,
            "apps": 13,
            "subOn": 0,
            "minsPlayed": 1126,
            "manOfTheMatch": 1,
            "yellowCard": 6.0,
            "redCard": 0.0,
            "goal": 1,
            "assistTotal": 0,
            "shotsPerGame": 0.53846153846153844,
            "aerialWonPerGame": 3.5384615384615383,
            "passSuccess": 86.336633663366342
        },
        {
            "name": "Angus MacDonald",
            "firstName": "Angus",
            "lastName": "MacDonald",
            "playerId": 110825,
            "height": 184,
            "weight": 70,
            "age": 24,
            "isManOfTheMatch": false,
            "isActive": true,
            "isOpta": true,
            "playedPositions": "-DC-",
            "positionText": "Defender",
            "playedPositionsShort": "D(C)",
            "teamId": 142,
            "teamName": "Barnsley",
            "seasonId": 6365,
            "seasonName": "2016/2017",
            "tournamentId": 7,
            "tournamentRegionId": 252,
            "tournamentRegionCode": "gb-eng",
            "regionCode": "gb-eng",
            "tournamentName": "Championship",
            "tournamentShortName": "EC",
            "rating": 7.5066666666666677,
            "ranking": 6,
            "apps": 12,
            "subOn": 0,
            "minsPlayed": 1080,
            "manOfTheMatch": 0,
            "yellowCard": 3.0,
            "redCard": 0.0,
            "goal": 0,
            "assistTotal": 0,
            "shotsPerGame": 0.33333333333333331,
            "aerialWonPerGame": 4.833333333333333,
            "passSuccess": 72.147651006711413
        },
        {
            "name": "Marc Roberts",
            "firstName": "Marc",
            "lastName": "Roberts",
            "playerId": 138949,
            "height": 183,
            "weight": 81,
            "age": 26,
            "isManOfTheMatch": false,
            "isActive": true,
            "isOpta": true,
            "playedPositions": "-DC-",
            "positionText": "Defender",
            "playedPositionsShort": "D(C)",
            "teamId": 142,
            "teamName": "Barnsley",
            "seasonId": 6365,
            "seasonName": "2016/2017",
            "tournamentId": 7,
            "tournamentRegionId": 252,
            "tournamentRegionCode": "gb-eng",
            "regionCode": "gb-eng",
            "tournamentName": "Championship",
            "tournamentShortName": "EC",
            "rating": 7.503125,
            "ranking": 7,
            "apps": 16,
            "subOn": 0,
            "minsPlayed": 1440,
            "manOfTheMatch": 1,
            "yellowCard": 3.0,
            "redCard": 0.0,
            "goal": 2,
            "assistTotal": 2,
            "shotsPerGame": 0.625,
            "aerialWonPerGame": 7.0625,
            "passSuccess": 61.595547309833023
        },
        {
            "name": "Bradley Johnson",
            "firstName": "Bradley",
            "lastName": "Johnson",
            "playerId": 12490,
            "height": 178,
            "weight": 68,
            "age": 29,
            "isManOfTheMatch": false,
            "isActive": true,
            "isOpta": true,
            "playedPositions": "-MC-ML-",
            "positionText": "Midfielder",
            "playedPositionsShort": "M(CL)",
            "teamId": 20,
            "teamName": "Derby",
            "seasonId": 6365,
            "seasonName": "2016/2017",
            "tournamentId": 7,
            "tournamentRegionId": 252,
            "tournamentRegionCode": "gb-eng",
            "regionCode": "gb-eng",
            "tournamentName": "Championship",
            "tournamentShortName": "EC",
            "rating": 7.4954545454545443,
            "ranking": 8,
            "apps": 11,
            "subOn": 0,
            "minsPlayed": 952,
            "manOfTheMatch": 1,
            "yellowCard": 4.0,
            "redCard": 0.0,
            "goal": 2,
            "assistTotal": 1,
            "shotsPerGame": 1.3636363636363635,
            "aerialWonPerGame": 4.0909090909090908,
            "passSuccess": 71.908127208480565
        },
        {
            "name": "Christophe Berra",
            "firstName": "Christophe",
            "lastName": "Berra",
            "playerId": 8287,
            "height": 186,
            "weight": 81,
            "age": 31,
            "isManOfTheMatch": false,
            "isActive": true,
            "isOpta": true,
            "playedPositions": "-DC-",
            "positionText": "Defender",
            "playedPositionsShort": "D(C)",
            "teamId": 165,
            "teamName": "Ipswich",
            "seasonId": 6365,
            "seasonName": "2016/2017",
            "tournamentId": 7,
            "tournamentRegionId": 252,
            "tournamentRegionCode": "gb-eng",
            "regionCode": "gb-sct",
            "tournamentName": "Championship",
            "tournamentShortName": "EC",
            "rating": 7.4789473684210526,
            "ranking": 9,
            "apps": 19,
            "subOn": 0,
            "minsPlayed": 1710,
            "manOfTheMatch": 3,
            "yellowCard": 4.0,
            "redCard": 0.0,
            "goal": 0,
            "assistTotal": 1,
            "shotsPerGame": 0.94736842105263153,
            "aerialWonPerGame": 6.2105263157894735,
            "passSuccess": 58.636363636363633
        },
        {
            "name": "Adam Webster",
            "firstName": "Adam",
            "lastName": "Webster",
            "playerId": 109922,
            "height": 191,
            "weight": 0,
            "age": 21,
            "isManOfTheMatch": false,
            "isActive": true,
            "isOpta": true,
            "playedPositions": "-DC-",
            "positionText": "Defender",
            "playedPositionsShort": "D(C)",
            "teamId": 165,
            "teamName": "Ipswich",
            "seasonId": 6365,
            "seasonName": "2016/2017",
            "tournamentId": 7,
            "tournamentRegionId": 252,
            "tournamentRegionCode": "gb-eng",
            "regionCode": "gb-eng",
            "tournamentName": "Championship",
            "tournamentShortName": "EC",
            "rating": 7.4780000000000006,
            "ranking": 10,
            "apps": 15,
            "subOn": 1,
            "minsPlayed": 1227,
            "manOfTheMatch": 2,
            "yellowCard": 1.0,
            "redCard": 0.0,
            "goal": 0,
            "assistTotal": 0,
            "shotsPerGame": 0.2,
            "aerialWonPerGame": 5.0666666666666664,
            "passSuccess": 58.256029684601117
        }],
        "paging": {
            "currentPage": 1,
            "totalPages": 34,
            "resultsPerPage": 10,
            "totalResults": 338,
            "firstRecordIndex": 1,
            "lastRecordIndex": 10
        },
        "statColumns": ["apps",
        "subOn",
        "minsPlayed",
        "goal",
        "assistTotal",
        "yellowCard",
        "redCard",
        "shotsPerGame",
        "passSuccess",
        "aerialWonPerGame",
        "manOfTheMatch"]
    }
    

    没错!那么现在怎么办? 模拟这个请求并解析内容,因为它已经是JSON格式了,内置模块json可以轻松完成这项工作,你甚至不必使用BeautifulSoup

    你可能会问,我直接浏览这个链接怎么什么都没有?那是因为他们在服务器上设置了限制,以便只有具有有效标头的请求才能获得提要。那么我该如何绕过呢? 用正确的参数(主要是标题)“生动地”模拟,让他们相信你。

    【讨论】:

      【解决方案2】:

      此页面使用javascript获取数据,您可以在此链接中找到原始数据:

      https://www.whoscored.com/StatisticsFeed/1/GetPlayerStatistics?category=summary&subcategory=all&statsAccumulationType=0&isCurrent=true&playerId=&teamIds=&matchId=&stageId=13832&tournamentOptions=7&sortBy=Rating&sortAscending=&age=&ageComparisonType=&appearances=&appearancesComparisonType=&field=Overall&nationality=&positionOptions=&timeOfTheGameEnd=&timeOfTheGameStart=&isMinApp=true&page=&includeZeroValues=&numberOfPlayersToPick=10
      

      可以更改url的每个字段以获取您需要的数据。

      【讨论】:

        【解决方案3】:

        这是因为网站不想让你抓取它。

        我使用selenium 发送请求并描绘了模拟 它创建的浏览器

        它使用了Incapsula,这是一项安全服务(他们甚至有一些information关于他们网站的抓取)-看看,它很有趣-

        • This 可能会有所帮助

        【讨论】:

          猜你喜欢
          • 2015-05-25
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2018-04-25
          • 2014-06-20
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多