如何从网站表单中提取特定数据答案

【问题标题】：How to extract specific data from a website's form如何从网站表单中提取特定数据
【发布时间】：2021-11-03 04:07:07
【问题描述】：

我正在尝试从以下网站获取特定数据。只有一种形式是“令牌地址”。但我不知道如何提取“Buy Tax：”和“Sell Tax：”之后显示的数字。我只需要没有百分号的数字。使用 Python 提取此信息的最简洁方法是什么？

我的失败尝试：

xPath_buy = /html/body/div/div[1]/div/p[5]/text()[1]
xPath_sell = /html/body/div/div[1]/div/p[5]/text()[2]

token = "0x40619dc9f00ea34e51d96b6ec5d8a6ad75457434"
url = "https://honeypot.is/?address=" + token

def tax(token):
    url = "https://honeypot.is/?address=" + token
    HTML = requests.get (url)
    soup = BeautifulSoup(HTML.text, 'html.parser')
    text = soup.find('div style', attrs={'xpath': '//*[@id="shitcoin"]/div/p[5]/text()[1]'})
    return text

buy_tax = tax(token)
print(buy_tax)

【问题讨论】：

js计算：见view-source:honeypot.is/?address=0x40619dc9f00ea34e51d96b6ec5d8a6ad75457434（第250行）

标签： python html python-requests

【解决方案1】：

购买税和销售税都使用 JavaScript 动态加载到页面中。你可以通过print(soup) 来判断——你不会在那里找到百分比。正在使用的 JavaScript 代码如下所示。

let gasdiv = '<p>Gas used for Buying: ' + numberWithCommas(buyGasUsed) + '<br>Gas used for Selling: ' + numberWithCommas(sellGasUsed) + '</p>';
        document.getElementById('shitcoin').innerHTML = '<div style="max-width: 100%;" class="ui compact success message"><div class="header">Does not seem like a honeypot.</div><p>This can always change! Do your own due diligence.</p><p>Address: ' + addressToOutput + '</p><p id="token-info">'+tokenName + ' ('+tokenSymbol+')'+'</p>'+maxdiv+gasdiv+'<p>Buy Tax: ' + buy_tax + '%<br>Sell Tax: ' + sell_tax + '%</p></div>';

您需要改用Selenium，并执行以下操作：

from selenium.webdriver.common.by import By

def tax(token):
    url = "https://honeypot.is/?address=" + token
    HTML = requests.get (url)
    text = driver.find_element(By.XPATH, xPath_buy)
    return text

buy_tax = tax(token)
print(buy_tax)

【讨论】：

当然可以，但问题是我在运行 print(soup) 时没有得到数值，而是得到 'buy_tax' 和 'sell_tax'。您能否编写一个完整的代码，仅返回给定令牌地址的两种税的数值？
不可能通过 BeautifulSoup 来实现，因为它根本不存在 - 动态加载。就像我说的，你需要硒。并做 driver.find_elements(By.XPATH, xPath_buy) @BJonas88
...很好！这就是我正在寻找的东西。但是，什么是驱动程序？它似乎是未定义的。我试过 By 但它返回：Traceback (most recent call last): File "c:/Users/Owner/Documents/BlockchainPy/MyScripts/scratchTest.py", line 44, in <module> buy_tax = tax(token0) File "c:/Users/Owner/Documents/BlockchainPy/MyScripts/scratchTest.py", line 41, in tax text = By.find_elements(By.XPATH, xPath_buy) AttributeError: type object 'By' has no attribute 'find_elements'
@BJonas88 我的错，它应该是没有's'的find_element。你能再试一次吗？
见stackoverflow.com/questions/64927909/…@BJonas88