【问题标题】:Web Scrape on a Dynamic Web Page Get Access Token动态网页上的网页抓取获取访问令牌
【发布时间】:2021-07-20 22:23:06
【问题描述】:

谁能帮我获取此网页中的访问令牌?我正在尝试进行网络抓取,但无法自动获取令牌。 http://200.75.4.210:8080/CIODCH/login.aspx

这是我目前所拥有的,但我无法在响应中获取令牌...

# Get the token
headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'}
login = {
    'Password': "123456789.", #it is a generic password, please dont change it
    'Usuario': "XXXXXXX"}

response = requests.post('https://apidch.kairosmining.com/auth/login', headers=headers, data=login).json()

# Use the token to input into the headers
url = 'http://apidch.kairosmining.com/api/CalidadAFlotacion'

headers.update({'authorization': 'Bearer %s' % response['token']})

这是我将在其中进行网络抓取的 url http://200.75.4.210:8080/CIODCH/Intra/calidadproducto/cpfvista.aspx

【问题讨论】:

  • 您提供的用户名/密码似乎无效,我无法使用浏览器登录
  • 请再试一次...我刚改了

标签: python web-scraping beautifulsoup


【解决方案1】:
import httpx
import pandas as pd
import trio
from bs4 import BeautifulSoup


loginurl = "http://200.75.4.210:8080/CIODCH/login.aspx"


async def get_soup(content):
    return BeautifulSoup(content, 'lxml')


async def log_in(client):
    r = await client.get(loginurl)
    soup = await get_soup(r.text)
    data = {k["id"]: k["value"] for k in soup.select("input[name^=__]")}
    data.update({
        "txtUser": "redacted",
        "txtPassword": "987654321.",
        "cmdLog": "Entrar"
    })
    await client.post(loginurl, data=data)
    r = await client.get(
        'http://200.75.4.210:8080/CIODCH/Intra/calidadproducto/cpfvista.aspx')

    soup = await get_soup(r.text)
    client.headers.update({
        "Authorization": "Bearer {}".format(soup.select_one('iframe')['src'].split('=')[-1])
    })


allin = []


async def workers(channel):
    async with channel:
        async for client, date in channel:
            print(f"Extracted {date}")
            data = {
                "fecha": date,
                "parametro": 1
            }
            r = await client.post('http://apidch.kairosmining.com/api/CalidadAFlotacion', data=data)

            goal = pd.DataFrame.from_records(r.json())
            if not goal.empty:
                allin.append(goal)


async def main():
    async with httpx.AsyncClient(timeout=None) as client, trio.open_nursery() as nurse:
        await log_in(client)

        sender, receiver = trio.open_memory_channel(0)

        async with receiver:
            for _ in range(20):
                nurse.start_soon(workers, receiver.clone())

            async with sender:
                for i in pd.date_range(start="2021-07-1", end="2021-07-22").strftime('%Y-%#m-%d'):
                    await sender.send([client, i])

    df = pd.concat(allin, ignore_index=True)
    print(df)


if __name__ == "__main__":
    trio.run(main)

输出:

       Id                     Fecha  Area Turno Solidos Malla Comentarios         dia   hora
0    6375  2021-07-03T00:58:01.123Z    A2     B      43    30   00:55 hrs  03-07-2021  00:58
1    6376  2021-07-03T00:58:52.826Z    A1     B      38    25   00:55 hrs  03-07-2021  00:58
2    6377  2021-07-03T03:17:07.303Z    A1     B      41    29   03:15 hrs  03-07-2021  03:17
3    6378  2021-07-03T03:24:26.813Z    A2     B      46    38   03:20 hrs  03-07-2021  03:24
4    6379  2021-07-03T03:47:48.540Z    A0     B      38    30   21:30 hrs  03-07-2021  03:47
..    ...                       ...   ...   ...     ...   ...         ...         ...    ...
470  6807  2021-07-21T10:19:13.230Z  None  None    None  None        None  21-07-2021  10:19
471  6808  2021-07-21T10:19:13.523Z  None  None    None  None        None  21-07-2021  10:19
472  6809  2021-07-21T10:22:25.393Z  None  None    None  None        None  21-07-2021  10:22
473  6810  2021-07-21T10:22:25.450Z  None  None    None  None        None  21-07-2021  10:22
474  6811  2021-07-21T10:22:25.450Z  None  None    None  None        None  21-07-2021  10:22

[475 rows x 9 columns]

【讨论】:

  • 感谢您的回答...但是在尝试获取数据时,响应是我没有正确的令牌...。我似乎也无法在 r 变量中找到它跨度>
  • @EliasUrra 这是为了让您登录网站。你必须遵循其他要求。尤其是您的用户/通行证甚至无效
  • 我刚刚检查了我的用户/通行证,没问题。让我看看还有什么可以做的
  • @EliasUrra 我再次检查了您的用户/通行证,但它对我也不起作用。甚至另一位用户也为您提到了这一点。从这里我不跟随。
  • 是的,他在我更改之前发表了评论.. 但现在我已经尝试了 3 种不同的浏览器和 3 种不同的网络,它工作正常......这很奇怪......你正在尝试这个网址,对吧? 200.75.4.210:8080/CIODCH/login.aspx
猜你喜欢
  • 2020-06-18
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-09-12
  • 2019-01-13
  • 2018-05-31
  • 1970-01-01
相关资源
最近更新 更多