【问题标题】:Curl gives response but python does not and the request call does not terminate?Curl给出响应但python没有响应并且请求调用不会终止?
【发布时间】:2021-05-03 17:39:48
【问题描述】:

我正在尝试以下 curl 请求

curl 'https://www.nseindia.com/api/historical/cm/equity?symbol=COALINDIA&series=\[%22EQ%22\]&from=03-05-2020&to=03-05-2021&csv=true' \
-H 'authority: www.nseindia.com' \
-H 'accept: */*' \
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/88.0.4324.182 Safari/537.36' \
-H 'x-requested-with: XMLHttpRequest' \
-H 'sec-gpc: 1' \
-H 'sec-fetch-site: same-origin' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-dest: empty' \
-H 'referer: https://www.nseindia.com/get-quotes/equity?symbol=COALINDIA' \
-H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
-H 'cookie: ak_bmsc=2D5CCD6F330B77016DD02ADFD8BADB8A58DDD69E733C0000451A9060B2DF0E5C~pllIy1yQvFABwPqSfaqwV4quP8uVOfZBlZe9dhyP7+7vCW/YfXy32hQoUm4wxCSxUjj8K67PiZM+8wE7cp0WV5i3oFyw7HRmcg22nLtNY4Wb4xn0qLv0kcirhiGKsq4IO94j8oYTZIzN227I73UKWQBrCSiGOka/toHASjz/R10sX3nxqvmMSBlWvuuHkgKOzrkdvHP1YoLPMw3Cn6OyE/Z2G3oc+mg+DXe8eX1j8b9Hc=; nseQuoteSymbols=[{"symbol":"COALINDIA","identifier":null,"type":"equity"}]; nsit=X5ZCfROTTuLVwZzLBn7OOtf0; AKA_A2=A; bm_mi=6CE0B82205ACE5A1F72250ACDDFF563E~LZ4/HQ257rSMBPCrxy0uSDvrSxj4hHpLQqc8R5JZOzUZYo1OqZg5Q/GOt88XNtMbsWM8bB22vtCXzvksGwPcC/bH2nPFEZr0ci6spQ4GOpCa/TM7soc02HVf0tyDTkmg/ZdLZlWzond4r0vn+QpSB7f3fiVza1Gdx9OaFL1i3rvqe1OKmFONreHEue20PL0hlREVWeLcFM/5DxKArPwzCSopPp62Eea1510iivl7GmY=; nseappid=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJhcGkubnNlIiwiYXVkIjoiYXBpLm5zZSIsImlhdCI6MTYyMDA2MTQ5OSwiZXhwIjoxNjIwMDY1MDk5fQ.YBTQ0MqRayD3QBM3V6zUt5zbRRICkbIhWWNedkDYrdU; bm_sv=C49B743B48F174C77F3DDAD188AA6D87~bm5TD36snlaRLx9M5CS+FOUicUcbVV3OIKjZU2WLwd1PtHYUum7hnBfYeUCDv+5Xdb9ADklnmm1cwZGJJbiBstcA6c5vju53C7aTFBorl8SJZjBN/4ku61oz0ncrQYCaSxkFGkRRY9VMWm6SpQwHXfMsUzc/Qk7301zs7KZuGCY=' \
--compressed 

这为我们提供了所需的响应(下面的示例)

"Date ","series ","OPEN ","HIGH ","LOW ","PREV. CLOSE ","ltp ","close ","vwap ","52W H","52W L ","VOLUME ","VALUE ","No of trades "
"03-May-2021","EQ","133.00","133.45","131.20","133.05","132.20","132.20","132.21","163.00","109.55",10262391,"1,356,811,541.80",59409

但是如果我使用下面的python脚本来获取数据

import requests

headers = {
'authority': 'www.nseindia.com',
'accept': '*/*',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
'sec-gpc': '1',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.nseindia.com/get-quotes/equity?symbol=COALINDIA',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8','cookie':'ak_bmsc=2D5CCD6F330B77016DD02ADFD8BADB8A58DDD69E733C0000451A9060B2DF0E5C~pllIy1yQvFABwPqSfaqwV4quP8uVOfZBlZe9dhyP7+7vCW/YfXy32hQoUm4wxCSxUjj8K67PiZM+8wE7cp0WV5i3oFyw7HRmcg22nLtNY4Wb4xn0qLv0kcirhiGKsq4IO94j8oYTZIzN227I73UKWQBrCSiGOka/toHASjz/R10sX3nxqvmMSBlWvuuHkgKOzrkdvHP1YoLPMw3Cn6OyE/Z2G3oc+mg+DXe8eX1j8b9Hc=; nseQuoteSymbols=[{"symbol":"COALINDIA","identifier":null,"type":"equity"}]; nsit=X5ZCfROTTuLVwZzLBn7OOtf0; AKA_A2=A; bm_mi=6CE0B82205ACE5A1F72250ACDDFF563E~LZ4/HQ257rSMBPCrxy0uSDvrSxj4hHpLQqc8R5JZOzUZYo1OqZg5Q/GOt88XNtMbsWM8bB22vtCXzvksGwPcC/bH2nPFEZr0ci6spQ4GOpCa/TM7soc02HVf0tyDTkmg/ZdLZlWzond4r0vn+QpSB7f3fiVza1Gdx9OaFL1i3rvqe1OKmFONreHEue20PL0hlREVWeLcFM/5DxKArPwzCSopPp62Eea1510iivl7GmY=; nseappid=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJhcGkubnNlIiwiYXVkIjoiYXBpLm5zZSIsImlhdCI6MTYyMDA2MTQ5OSwiZXhwIjoxNjIwMDY1MDk5fQ.YBTQ0MqRayD3QBM3V6zUt5zbRRICkbIhWWNedkDYrdU; bm_sv=C49B743B48F174C77F3DDAD188AA6D87~bm5TD36snlaRLx9M5CS+FOUicUcbVV3OIKjZU2WLwd1PtHYUum7hnBfYeUCDv+5Xdb9ADklnmm1cwZGJJbiBstcA6c5vju53C7aTFBorl8SJZjBN/4ku61oz0ncrQYCaSxkFGkRRY9VMWm6SpQwHXfMsUzc/Qk7301zs7KZuGCY=',}

params = (
('symbol', 'COALINDIA'),
('series', '/["EQ"/]'),
('from', '30-04-2021'),
('to', '03-05-2021'),
('csv', 'true'),
)

response = requests.get('https://www.nseindia.com/api/historical/cm/equity', headers=headers, params=params)

卡在最后一行。

我正在使用 python3.9 和 urllib3。

不确定是什么问题。 此网址从网站下载 csv 文件。

【问题讨论】:

    标签: python-3.x curl web-scraping urllib3


    【解决方案1】:

    您必须使用 Python 跳过一些循环才能获得您想要的文件。主要是你需要正确获取请求头cookie部分,否则你会不断得到401代码。

    首先,您需要从权威www.nseindia.com 获取常规 cookie。然后,您需要从https://www.nseindia.com/json/quotes/equity-historical.json 获取bm_sv cookie。最后,添加一个名为 nseQuoteSymbols 的内容。

    将所有内容粘合在一起并发出获取文件的请求。

    方法如下:

    from urllib.parse import urlencode
    
    import requests
    
    headers = {
        'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) '
                      'AppleWebKit/537.36 (KHTML, like Gecko) '
                      'Chrome/88.0.4324.182 Safari/537.36',
        'x-requested-with': 'XMLHttpRequest',
        'referer': 'https://www.nseindia.com/get-quotes/equity?symbol=COALINDIA',
    }
    
    payload = {
        "symbol": "COALINDIA",
        "series": '["EQ"]',
        "from": "04-04-2021",
        "to": "04-05-2021",
        "csv": "true",
    }
    
    api_endpoint = "https://www.nseindia.com/api/historical/cm/equity?"
    
    nseQuoteSymbols = 'nseQuoteSymbols=[{"symbol":"COALINDIA","identifier":null,"type":"equity"}]; '
    
    
    def make_cookies(cookie_dict: dict) -> str:
        return "; ".join(f"{k}={v}" for k, v in cookie_dict.items())
    
    
    with requests.Session() as connection:
        authority = connection.get("https://www.nseindia.com", headers=headers)
        historical_json = connection.get("https://www.nseindia.com/json/quotes/equity-historical.json", headers=headers)
        bm_sv_string = make_cookies(historical_json.cookies.get_dict())
    
        cookies = make_cookies(authority.cookies.get_dict()) + nseQuoteSymbols + bm_sv_string
        connection.headers.update({**headers, **{"cookie": cookies}})
    
        the_real_slim_shady = connection.get(f"{api_endpoint}{urlencode(payload)}")
        csv_file = the_real_slim_shady.headers["Content-disposition"].split("=")[-1]
        with open(csv_file, "wb") as f:
            f.write(the_real_slim_shady.content)
    

    输出 -> .csv 文件,如下所示:

    【讨论】:

    • 谢谢。我知道这可能不在问题的范围内,但是您是如何发现这是需要做的?与其他网站一样,我在问题中所做的方式运行良好。我不会使用 requests.session() 和您建议的步骤。
    • 好吧,我所做的只是伪造请求,使其类似于浏览器发送的请求。如果您查看开发人员工具中的 Network -> XHR 选项卡,您会看到 cookie 需要与请求标头一起使用。 request.Session() 有助于重用该数据,正如您在我的回答中看到的那样。
    • 很好的解释+
    • @QHarr,谢谢。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-03-27
    • 2016-03-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多