【问题标题】:Python requests and urllib not returning correct dataPython 请求和 urllib 没有返回正确的数据
【发布时间】:2016-02-23 19:35:48
【问题描述】:

您好,我在从以下网站获取数据时遇到问题:

http://weather.news24.com/sa/johannesburg

我曾尝试使用 python 请求和 urllib,但没有成功。通过使用 chrome developertools 检查页面资源,我发现以下 url 包含期望数据,但我仍然没有以 json 格式获取数据,因为我想获取低温和高温、日出、日落。

在我看来,有一个加载数据的 ajax 函数。 我对两者都进行了尝试,以便以后可以在 django 中使用它们。我正在使用python 3。 任何帮助将不胜感激。

【问题讨论】:

    标签: python-requests urllib


    【解决方案1】:

    希望这会有所帮助:

    import requests,re,json
    from bs4 import BeautifulSoup
    
    # This is your main url
    main_url="http://weather.news24.com/sa/johannesburg"
    
    # I am extracting city name from url. Not sure if you already have that somewhere
    mycity=main_url.split('/')[-1]
    
    # Calling your main_url
    r=requests.get(main_url)
    
    
    # Now The only valuable info you get on this request is the CityId for Johannesburg
    # So lets grab it using BeautifulSoup
    
    soup=BeautifulSoup(r.content)
    
    # This gives me the list of all the cities on website and thier CityId
    city_list=soup.find(id="ctl00_WeatherContentHolder_ddlCity")
    
    # I am looking for city (johannesburg) within the city_list
    # re.I in the code below is to ignoreCASE
    city_as_on_website=city_list.find(text=re.compile(mycity,re.I)).parent
    cityId=city_as_on_website['value']
    
    
    # Now make a POST request to following url with following headers and data to get the JSON
    json_url="http://weather.news24.com/ajaxpro/TwentyFour.Weather.Web.Ajax,App_Code.ashx"
    
    headers={'Content-Type':'text/plain; charset=UTF-8',
    'Host':'weather.news24.com',
    'Origin':'http://weather.news24.com',
    'Referer':main_url,
    'User-Agent':'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/48.0.2564.82 Chrome/48.0.2564.82 Safari/537.36',
    'X-AjaxPro-Method':'GetCurrentOne'}
    
    payload={"cityId": cityId} # This is the cityId that we found above using BeautifulSoup
    
    # Now send the POST request
    r=requests.post(json_url,headers=headers,data=json.dumps(payload))
    
    # r.content will sure give you the json data that you expect.
    # However, the sad thing is that this one is not well formatted.
    # And solving that will be completely another question on Stackoverflow
    # Hope, you will fight your way to it.
    # Good Luck! :-)
    
    
    Out[1]: '{"__type":"TwentyFour.Services.Weather.Objects.CurrentOneReport, TwentyFour.Services.Weather, Version=1.2.0.0, Culture=neutral, PublicKeyToken=null","Observations":[{"__type":"TwentyFour.Services.Weather.Objects.Observation, TwentyFour.Services.Weather, Version=1.2.0.0, Culture=neutral, PublicKeyToken=null","CityName":"Lanseria Civ / Mil","Location":"Lanseria Civ / Mil","Sky":"Passing clouds","Temperature":"25.00","Humidity":"54","WindSpeed":"15","WindDirectionAbreviated":"SE","Comfort":"26","DewPoint":"15","Description":"Passing clouds. Warm.","Icon":"2","IconName"
    
    ...
    ...
    
    ":null,"Rainfall":"14mm","Snowfall":"*","PrecipitationProbability":"52","Icon":"22","IconName":"tstorms","Cached":false},"AstronomyReport":null,"MarineReport":null,"LocalTime":"Wed, 24 Feb 2016 17:30:27 SAST","LocalUpdateTime":"Wed, 24 Feb 2016 17:12:07 SAST","CountryName":"South Africa","TimeZone":"2","Cached":false};/*'
    

    【讨论】:

    • @SlangI'mmatalk 1) 你的变量mycity 的值是多少,应该是johannesburg 或其他城市。 2)你的变量city_list的值是多少?
    • 非常感谢。但这就是我遇到了损坏的 json 的结果,而且我也不知道为什么它返回 3 个不同的城市名称而不是我想要的特定城市名称
    • @SlangI'mmatalk 正如我所说,这将是一个完全不同的场景。您可以发布另一个问题来解释您的需求。我也许能帮上忙。请在此处粘贴指向该问题的链接
    • 我尝试格式化它,但仍然出现一些奇怪的错误,这是帖子的链接stackoverflow.com/questions/35621105/json-data-format-error 谢谢
    • 美好的一天,你介意看看这个stackoverflow.com/questions/35648169/…有什么问题吗
    猜你喜欢
    • 1970-01-01
    • 2018-11-04
    • 2016-08-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-12-26
    • 2016-11-12
    相关资源
    最近更新 更多