【问题标题】:why am i getting the same post data though i'm posting to different URL为什么我发布到不同的 URL 却得到相同的发布数据
【发布时间】:2020-06-27 13:54:28
【问题描述】:

我正在尝试抓取http://www.moneycontrol.com/stocks/histstock.php?sc_id=BPC&mycomp=BPCL 获取价格数据。 所以我遵循了以下

  1. 打开该链接并输入日期(每天)
  2. chrome->inspect->Network - 获取Form详细信息,发现POST的URL
  3. 输入表单数据并点击 POST。

我有多个代码需要数据。

Eg:
    'AXISBANK': 'http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=AXISBANK',
    'BAJAJ-AUTO': 'http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=BPCL',

但是当我运行 POST 时,即使我发布到的 URL 不同,我也会得到相同的输出。 我可能会错过什么?

输出:

running for http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=AXISBANK
           Date   Open    High    Low   Close   Volume
244  05-01-2016  881.3  905.00  881.3  900.65  1372748
245  04-01-2016  876.2  892.45  871.7  880.80   709103
246  01-01-2016  882.0  885.60  876.9  878.75   294006
running for http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=BPCL
           Date   Open    High    Low   Close   Volume
244  05-01-2016  881.3  905.00  881.3  900.65  1372748
245  04-01-2016  876.2  892.45  871.7  880.80   709103
246  01-01-2016  882.0  885.60  876.9  878.75   294006

这是我为测试它而编写的代码。

url='http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=AXISBANK'
url2='http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=BPCL'
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
data = {
    'frm_dy':'01',
    'frm_mth':'01',
    'frm_yr':'2016',
    'to_dy':'31',
    'to_mth':'12',
    'to_yr':'2016',
    'hdn':'daily'
    # 'x':'15',
    # 'y':'14'
}
print('running for {}'.format(url))
test = requests.post(url,data=data) # Post the data
doc = bs(test.text,'html.parser')
tables = doc.find('table',{'class':'tblchart'})
tData = pd.read_html(str(tables),header=1) #You get a list

#Convert it to dataFrame
tData = tData[0].drop(columns=['(High-Low)','(Open-Close)'])
print(tData.tail(3))

import time
time.sleep(20) # Hopefully sleep works?
url = url2 # test only 
print('running for {}'.format(url))
test = requests.post(url,data=data)
doc = bs(test.text,'html.parser')
tables = doc.find('table',{'class':'tblchart'})
tData = pd.read_html(str(tables),header=1) #You get a list

#Convert it to dataFrame
tData = tData[0].drop(columns=['(High-Low)','(Open-Close)'])
print(tData.tail(3))

我注意到,当我直接从 URL 运行它时,与查看“检查”时相比,sc_id 发生了变化。 我不知道 sc_id 是什么(sessions_ID?) 我对网络抓取完全陌生。所以我真的不知道陷阱或者我是否遇到了任何问题。 我可能会错过什么?

【问题讨论】:

    标签: python-3.x web-scraping beautifulsoup python-requests


    【解决方案1】:

    您必须正确设置URL中的参数sc_id=

    对于 AXIS 银行,它是 UTI10

    对于 Bajaj Auto,它是 BA06

    例如:

    import re
    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    
    
    def get_sc_id(name, full_name):
        url = 'https://www.moneycontrol.com/stocks/autosuggest.php'
        params = {'str': name}
        return re.search(r'set_val\(\'{}\',\'(.*?)\'\)'.format(full_name), requests.get(url, params=params).text, flags=re.I)[1]
    
    def get_table(sc_id, mycomp):
        url = 'https://www.moneycontrol.com/stocks/hist_stock_result.php'
        params = {
            'ex':'B',
            'sc_id': sc_id,
            'mycomp': mycomp
        }
        data = {
            'frm_dy':'01',
            'frm_mth':'01',
            'frm_yr':'2016',
            'to_dy':'31',
            'to_mth':'12',
            'to_yr':'2016',
            'hdn':'daily'
        }
    
        soup = BeautifulSoup(requests.post(url, data=data, params=params).content, 'html.parser')
        return pd.read_html( str(soup.select_one('.tblchart')) )[0].droplevel(0, axis=1)
    
    code = get_sc_id('AXIS', 'Axis Bank')
    print('Axis Bank code: ', code)
    print(get_table(code, 'Axis Bank'))
    
    code = get_sc_id('BAJAJ', 'Bajaj Auto')
    print('Bajaj Auto code:', code )
    print(get_table(code, 'Bajaj Auto'))
    

    打印:

    Axis Bank code:  UTI10
    
               Date    Open    High     Low   Close   Volume  (High-Low)  (Open-Close)
    0    30-12-2016  446.00  451.80  443.45  450.00   234037        8.35         -4.00
    1    29-12-2016  447.00  447.00  437.80  444.15   267677        9.20          2.85
    2    28-12-2016  437.45  447.85  436.00  439.50   251149       11.85         -2.05
    3    27-12-2016  430.00  438.55  430.00  437.45   210857        8.55         -7.45
    4    26-12-2016  432.15  436.00  427.00  431.75   405044        9.00          0.40
    ..          ...     ...     ...     ...     ...      ...         ...           ...
    242  07-01-2016  424.25  425.00  407.30  409.35  1441934       17.70         14.90
    243  06-01-2016  439.70  439.70  429.80  430.80   730512        9.90          8.90
    244  05-01-2016  439.00  440.00  433.65  436.35   726947        6.35          2.65
    245  04-01-2016  448.85  448.85  437.40  439.25   743518       11.45          9.60
    246  01-01-2016  450.00  452.70  445.80  449.80   433052        6.90          0.20
    
    [247 rows x 8 columns]
    
    Bajaj Auto code: BA06
    
               Date     Open     High      Low    Close  Volume  (High-Low)  (Open-Close)
    0    30-12-2016  2655.55  2667.00  2627.25  2633.85   10377       39.75         21.70
    1    29-12-2016  2621.00  2665.65  2611.50  2655.45    8704       54.15        -34.45
    2    28-12-2016  2629.35  2653.00  2624.55  2631.60    6475       28.45         -2.25
    3    27-12-2016  2563.00  2642.00  2563.00  2633.60   15491       79.00        -70.60
    4    26-12-2016  2618.00  2618.35  2578.00  2596.70    7205       40.35         21.30
    ..          ...      ...      ...      ...      ...     ...         ...           ...
    242  07-01-2016  2470.00  2481.80  2407.25  2419.25   15962       74.55         50.75
    243  06-01-2016  2495.00  2513.70  2475.00  2485.50   11975       38.70          9.50
    244  05-01-2016  2518.00  2520.00  2480.00  2497.05   11967       40.00         20.95
    245  04-01-2016  2507.90  2545.85  2480.65  2488.15   23077       65.20         19.75
    246  01-01-2016  2530.00  2530.00  2512.15  2520.05    9055       17.85          9.95
    
    [247 rows x 8 columns]
    

    【讨论】:

    • 非常感谢。这行得通。你能帮我找出应该使用 autosuggestion.php 文件来查看 sc_id 的方法吗?这将帮助我更好地理解流程。
    猜你喜欢
    • 2014-01-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-11-22
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多