为什么我发布到不同的 URL 却得到相同的发布数据答案

【问题标题】：why am i getting the same post data though i'm posting to different URL为什么我发布到不同的 URL 却得到相同的发布数据
【发布时间】：2020-06-27 13:54:28
【问题描述】：

我正在尝试抓取http://www.moneycontrol.com/stocks/histstock.php?sc_id=BPC&mycomp=BPCL 获取价格数据。所以我遵循了以下

打开该链接并输入日期（每天）
chrome->inspect->Network - 获取Form详细信息，发现POST的URL
输入表单数据并点击 POST。

我有多个代码需要数据。

Eg:
    'AXISBANK': 'http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=AXISBANK',
    'BAJAJ-AUTO': 'http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=BPCL',

但是当我运行 POST 时，即使我发布到的 URL 不同，我也会得到相同的输出。我可能会错过什么？

输出：

running for http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=AXISBANK
           Date   Open    High    Low   Close   Volume
244  05-01-2016  881.3  905.00  881.3  900.65  1372748
245  04-01-2016  876.2  892.45  871.7  880.80   709103
246  01-01-2016  882.0  885.60  876.9  878.75   294006
running for http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=BPCL
           Date   Open    High    Low   Close   Volume
244  05-01-2016  881.3  905.00  881.3  900.65  1372748
245  04-01-2016  876.2  892.45  871.7  880.80   709103
246  01-01-2016  882.0  885.60  876.9  878.75   294006

这是我为测试它而编写的代码。

url='http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=AXISBANK'
url2='http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=BPCL'
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
data = {
    'frm_dy':'01',
    'frm_mth':'01',
    'frm_yr':'2016',
    'to_dy':'31',
    'to_mth':'12',
    'to_yr':'2016',
    'hdn':'daily'
    # 'x':'15',
    # 'y':'14'
}
print('running for {}'.format(url))
test = requests.post(url,data=data) # Post the data
doc = bs(test.text,'html.parser')
tables = doc.find('table',{'class':'tblchart'})
tData = pd.read_html(str(tables),header=1) #You get a list

#Convert it to dataFrame
tData = tData[0].drop(columns=['(High-Low)','(Open-Close)'])
print(tData.tail(3))

import time
time.sleep(20) # Hopefully sleep works?
url = url2 # test only 
print('running for {}'.format(url))
test = requests.post(url,data=data)
doc = bs(test.text,'html.parser')
tables = doc.find('table',{'class':'tblchart'})
tData = pd.read_html(str(tables),header=1) #You get a list

#Convert it to dataFrame
tData = tData[0].drop(columns=['(High-Low)','(Open-Close)'])
print(tData.tail(3))

我注意到，当我直接从 URL 运行它时，与查看“检查”时相比，sc_id 发生了变化。我不知道 sc_id 是什么（sessions_ID？）我对网络抓取完全陌生。所以我真的不知道陷阱或者我是否遇到了任何问题。我可能会错过什么？

【问题讨论】：

标签： python-3.x web-scraping beautifulsoup python-requests

【解决方案1】：

您必须正确设置URL中的参数sc_id=。

对于 AXIS 银行，它是 UTI10

对于 Bajaj Auto，它是 BA06

例如：

import re
import requests
import pandas as pd
from bs4 import BeautifulSoup


def get_sc_id(name, full_name):
    url = 'https://www.moneycontrol.com/stocks/autosuggest.php'
    params = {'str': name}
    return re.search(r'set_val\(\'{}\',\'(.*?)\'\)'.format(full_name), requests.get(url, params=params).text, flags=re.I)[1]

def get_table(sc_id, mycomp):
    url = 'https://www.moneycontrol.com/stocks/hist_stock_result.php'
    params = {
        'ex':'B',
        'sc_id': sc_id,
        'mycomp': mycomp
    }
    data = {
        'frm_dy':'01',
        'frm_mth':'01',
        'frm_yr':'2016',
        'to_dy':'31',
        'to_mth':'12',
        'to_yr':'2016',
        'hdn':'daily'
    }

    soup = BeautifulSoup(requests.post(url, data=data, params=params).content, 'html.parser')
    return pd.read_html( str(soup.select_one('.tblchart')) )[0].droplevel(0, axis=1)

code = get_sc_id('AXIS', 'Axis Bank')
print('Axis Bank code: ', code)
print(get_table(code, 'Axis Bank'))

code = get_sc_id('BAJAJ', 'Bajaj Auto')
print('Bajaj Auto code:', code )
print(get_table(code, 'Bajaj Auto'))

打印：

Axis Bank code:  UTI10

           Date    Open    High     Low   Close   Volume  (High-Low)  (Open-Close)
0    30-12-2016  446.00  451.80  443.45  450.00   234037        8.35         -4.00
1    29-12-2016  447.00  447.00  437.80  444.15   267677        9.20          2.85
2    28-12-2016  437.45  447.85  436.00  439.50   251149       11.85         -2.05
3    27-12-2016  430.00  438.55  430.00  437.45   210857        8.55         -7.45
4    26-12-2016  432.15  436.00  427.00  431.75   405044        9.00          0.40
..          ...     ...     ...     ...     ...      ...         ...           ...
242  07-01-2016  424.25  425.00  407.30  409.35  1441934       17.70         14.90
243  06-01-2016  439.70  439.70  429.80  430.80   730512        9.90          8.90
244  05-01-2016  439.00  440.00  433.65  436.35   726947        6.35          2.65
245  04-01-2016  448.85  448.85  437.40  439.25   743518       11.45          9.60
246  01-01-2016  450.00  452.70  445.80  449.80   433052        6.90          0.20

[247 rows x 8 columns]

Bajaj Auto code: BA06

           Date     Open     High      Low    Close  Volume  (High-Low)  (Open-Close)
0    30-12-2016  2655.55  2667.00  2627.25  2633.85   10377       39.75         21.70
1    29-12-2016  2621.00  2665.65  2611.50  2655.45    8704       54.15        -34.45
2    28-12-2016  2629.35  2653.00  2624.55  2631.60    6475       28.45         -2.25
3    27-12-2016  2563.00  2642.00  2563.00  2633.60   15491       79.00        -70.60
4    26-12-2016  2618.00  2618.35  2578.00  2596.70    7205       40.35         21.30
..          ...      ...      ...      ...      ...     ...         ...           ...
242  07-01-2016  2470.00  2481.80  2407.25  2419.25   15962       74.55         50.75
243  06-01-2016  2495.00  2513.70  2475.00  2485.50   11975       38.70          9.50
244  05-01-2016  2518.00  2520.00  2480.00  2497.05   11967       40.00         20.95
245  04-01-2016  2507.90  2545.85  2480.65  2488.15   23077       65.20         19.75
246  01-01-2016  2530.00  2530.00  2512.15  2520.05    9055       17.85          9.95

[247 rows x 8 columns]

【讨论】：

非常感谢。这行得通。你能帮我找出应该使用 autosuggestion.php 文件来查看 sc_id 的方法吗？这将帮助我更好地理解流程。