【问题标题】:Webscraping by python通过 python 抓取网页
【发布时间】:2021-03-16 07:58:45
【问题描述】:

我是python的新手。我需要从网站“walletexplorer”(https://www.walletexplorer.com/)进行网络抓取,我需要从主页表中生成五个 tsv 输出文件(),每个文件只需要抓取链接到钱包的所有地址。示例输出可能如下所示:

exchanges.tsv
wallet     addresses
Huobi.com    19hx1B4pTDsBAYJJYYrj1t6bcVw1e9omuH
Huobi.com    16vyeuwbuNz9PKPs4VqqGLsVRJ2G6uJj39
...      ...    
Zydo.com     15ovvTEgmQEprftNH1yAgkZ5uHY1YqAnvu

我尝试的代码如下,我不知道如何继续:

import requests
from bs4 import BeautifulSoup

def get_upcoming_events(url):
    req = requests.get(url)

    soup = BeautifulSoup(req.text, 'lxml')

    events = soup.find({'class': 'serviceslist'}).findAll('href')
get_upcoming_events('https://www.walletexplorer.com/')

如果有人能指导我解释一下,不胜感激。

提前致谢。

【问题讨论】:

  • 您甚至没有获得包含钱包地址的页面。您所拥有的只是指向服务的链接。您将访问每个服务链接,遍历所有交易页面,抓取地址并继续下一个服务。
  • @baduker 谢谢,但每个钱包都有地址页面列表。无需进入交易页面。你能用一些示例代码解释一下吗?多页地址表怎么办?
  • 嗯,这些地址有很多,有些服务有数百万个。你真的需要这些数据吗?如果是这样,抓取它会使您受到限制和/或阻止。但是,您可以自己计算地址或请求访问 API。检查这个 - walletexplorer.com/info
  • @baduker 谢谢。我正在请求 API 访问。但是,只是为了我的学习目的如果我只需要最后一列旧/历史钱包来处理 tsv 文件,我该怎么做?如果您可以通过一些示例代码进行解释。提前致谢。

标签: python web-scraping beautifulsoup python-requests


【解决方案1】:

你可以试试这个:

import requests
from bs4 import BeautifulSoup

source_url = "https://www.walletexplorer.com/"

services = [
    f'{source_url}wallet/{s["href"].rsplit("/")[-1]}/addresses' for s
    in BeautifulSoup(
        requests.get(source_url).text,
        "html.parser",
    ).select("table.serviceslist td ul li a")
]

for service in services[:1]:
    s = BeautifulSoup(requests.get(service).text, "html.parser")
    print(f"Wallet addresses for {service.rsplit('/')[-2]}")
    print([i.find("a").getText() for i in s.find_all("td") if i.find("a")])

哪些输出:

Wallet addresses for Huobi.com
['19hx1B4pTDsBAYJJYYrj1t6bcVw1e9omuH', '16vyeuwbuNz9PKPs4VqqGLsVRJ2G6uJj39', '12uuCfvuxeyGhWaEU1sHTbkCN7eWMT7qFp', '1NMbcRqggWu75DG5AqZtin5kecdHoDVs3b', '1DgcQFeaCQBo41oxtzUuV3caW5TNjXvhLZ', '16K4VtKNxcaNkvNvDYiiWjyn1n49irJ8Ep', '1FDpmBztMAUvR1Q4JJeb2FnA4VUxUAAtmd', '1ko9cGfFxWoUMMZ3MsxuSjRHfLDNgyjr9', '1MnkzcBw8prYASnMiJ772ApcEsn1tFtn1A', '1CXrs7piiUdDXG1nT8jk43GWqCtPpbQbYu', '17WQqNZafGZWWa1zW4LaJ1h7S7qKKER8as', '14zzx8PWeXULMp11NmFdk3yuTabME6naBN', '1H8casG2e6keyjVGoQeh21oE2UJCtXECSL', '1JHsjazLvtpW6MC3AppTK8FtDU79mKeJDg', '18FVW8RUVnoHwBGFkRFsf9kyAHJcrcbiuS', '1F3GeJi6VesFMkaUF9o81vtJFkTdCs59bC', '1NMwkFzhkPRdK1XCH8dKxWmsCDCFFmx8Rq', '1BzudAAfaaqtNVUs1ZjFAzSZKoWKSg6NYY', '1LvWcQMQS8X5uyX7sSZSkEaUeM1ojYZ6r3', '1KXiM8NMPfAHvRdt3ZvUDrFWW1B29i2xr6', '1MTKTCg7Xh3yinQ87sjUzp73ES5caET61t', '15s6yYhzVsrM29nBi1geaJuBZFGQtMdhA6', '12aKRDdkL5Ae3zRVwbs9ywPo4Jfx44GYfd', '185ohog3RpD81YnG1p3Eb4DaS4FtU4hHc7', '1Crucw3RmmFvNdM35mt5QKC4tbVwWgXDgd', '1vdXJDiDSzeUAoM7eAcWZPhM9wEKmH8dG', '1HcQVoEFKk9rqU61UnAsKczxkeEj31hE92', '1Eb3pnx8LKsu4m7fR6Bt2bBw7wcEiH2erW', '1BDvRoPJK2evVhRRWJg3Fhyv7WPpwuiubJ', '144jMnE7sagf67e2e1vX9YPX9BhvWSoMnD', '12pjLweEBXtToCmqxQvimaEf3gxuHaCrhb', '12AH4fFy4FXYRzDhkYTtWdq3rFqFa6TJ4U', '1MwvFVwzZHSAsvvFSkdfU2EA5ieJC7YAWH', '17o41kDRLPvUGeQyu7gfymC3eRPTg6434v', '15N2pWryiFtFAyQC5c9vHTXimutSTFjDb5', '13L37EaqqSjXva9DcSgNQDWPV7Urko2wkZ', '1FAh5udQQvKxKuDARKB5qV26LcijqA7dy1', '16aJR7AYANSeJvqYgBEcd46rBQTasqs1ve', '1MRpBNxmFWB1xAwxmohFuRUL8whCsf2DBh', '1Q8Ho6YQzUmm8KKqYAgzfFGMqb6K83sxsN', '1HBnzQb6Vajf9yh2jiFkmvyfYZfFq9Ynyy', '1DWnomNDC6NRJDyrvqbLEEpXzpQRyap6jA', '1FkxMFkpsMDXtk2kCgViuLKJCyj55ZgL9v', '12o3YqvQ3u7phFd4YDitJ18BncGDTdFwtn', '1BT8GW75RwbKFeYXywqXfFB7fnd9BX6eKr', '13o5epgXwGyTZWSJMww18bry5AyQkUEMpu', '1GbiA1PKiHW8xNi9g71pMtFKsSs4ZAa8bQ', '1H66qMnVyW198rX576mHoierUJPhsxC6UA', '1AZifFwpYxJ6VGFuPzE1zSHD33FxNWCSXS', '1EA2ZCZEfxv7rW3mifz9jSK2yWeS3akRpc', '1LgCndYGW7rvUCjXPaxM7F7SnJCPA3aUjj', '1awTScPNF8zF5xieKCnVq2MoVkJms6sAf', '1QCph5a9Fm9e1p48iKnfmdEdZfPEAL6WWi', '13ND5KqL8FMK11udMNzfHBkqhJS4p1Vtw6', '1PMmHeQJB6VogEja98Un1h89WPvyKSKj19', '18efnKBBZqKqY6q1U3bVe1NB3HNxhsK686', '1PgPrjtVFvDC4vNwzoLhQsM5XSNHRBjQim', '1JAVtXrWbNmtat8ngVDCkV77HCtJapqxz8', '181zaA5V8ntBqjBFPEwTrZWBFyme7av9uD', '1GDLvz5gY4KR2J562gorQNhZUdfR1XVZES', '18iBEue7bK74axy4WMj8Yc5iPZZNj9sZ9B', '12k1wjjxz2t6KVmiYSxHXjoN3naE1FTwZq', '1HJV7cZ8TtbeCVfTVQP9TA16sgWJ1Qfarr', '1xXUeCGYzyGcwPa6V4pbi7gv1MDUc7Q7f', '1NHgveRqK326LeievoSrAGnknfMci9hLgs', '18CDxW6kKu5EmxwokmFe2njipVZpow88qX', '1DhWzMdZWwZXwmCYEajf4Auj3euaYtkFVJ', '1EWdvNgW1YTshfoM25E9LRmtktqFgJRoAN', '1CSydWPpafwEkYVZTbEas4bBAUEfVJ8SUp', '1ELbdBEn4zcvwCoWDPmmA31FRzsDo4mnSm', '1MXQSdXNR8P8bCT6732t8yu9fYAPTQUuxr', '1MNoBZr1GaDrxpNGxQmBhjJf6ycZdgU56X', '15bpXYz5FnTmh7cQQpUV7HiAGzHyvNxvex', '14hSMo1zBP4JE5r8soe3mmbhowrXdjHT8J', '1Csim7JNUrRgQ53W8GLK5Q2LYWxDEAhywR', '1AfypffpYUUzcaKU8XRvx6bfZsqr3sYdjP', '1M62FoKigWRCCVaXyNVhNBiH94WQFswLQW', '183xR5W1Vqw6gSKdteH5h9haWct23XGXa4', '1LYZkpA9jDHVCXrmzKwmKutNFk58JHj6rJ', '1ACeg9dJM4ogXGPyxYLHzvTj6AgM9S1d4u', '12Nv7TsECtLrWvdZjsd2M1vEJ5V8kLRQXP', '1NmC4QEjbjsCGgbDntJgdJEgXRx1yq8uqb', '1H7j2YeKwdatfmTmE6vUMotMNrnLA1JbyS', '1LWpXdrYKstHqLt1djAms1MRGhatn2YFcq', '1DWC7StAcFD8ydTvcHdxRx7GDQ7BBVqX7h', '1HS2BryuSnU4BAQLemfosBnZP7C7vom456', '1JBCzSCHgZWidMRHUqk2N7EVoPo4sNkER1', '168YVdM3qQtHURwzkJVVpt9ArLk5HtzWp8', '12ZMUTSQC3agsXWpPzyrBCkRGe7cRy2o32', '1JiyES25B4WLDsrvajHmM6PWrFjEk9vCmS', '1GhSk27WnVkhuNzPCT3SNMcSh4zowjqNcH', '1L5XCihBQrXuhR1T9oiyxXutKw3oS1i2eW', '1BgVEbXkKfPmy5zPgEQRpoDZtrZzRghn97', '19j3XsGnq8T9JtoGQuzqr5PX1d5ZWrWLFq', '1FqFJHvR91t8DTjAVD75CJ4cKTxuPPu6Cq', '169tgLDYcs68f3ZYh35k8JV5oyBZm5xLFa', '1J4Hi1emZUkTBgSJHHZfN56EAyzzVXqCEv', '1851LgKaSxjCHmmmEKXhoN8d3SNr4xU1LR', '1EcUMEdqMJkcpnwRxjaXf1a6dBCbDFqHGH', '18G2iXLSvCnFEvTmHMmu5SFVg3jnvdKtMX']

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2016-02-21
    • 2019-06-08
    • 1970-01-01
    • 2014-10-23
    • 1970-01-01
    • 2020-11-09
    • 1970-01-01
    相关资源
    最近更新 更多