【问题标题】:Python scrape CoinmarketcapPython 抓取 Coinmarketcap
【发布时间】:2021-08-28 14:21:02
【问题描述】:

我正在尝试从 coinmarketcap.com 收集市值数据。事实上,我成功地获得了市值前 10 的硬币,但是在前 10 之后它就不起作用了(结果变为无)。

这是我的代码,我使用的是 Chrome。

    import requests        
    import time
    from bs4 import BeautifulSoup

    url = 'https://coinmarketcap.com/'
    strhtml = requests.get(url)
    soup = BeautifulSoup(strhtml.text, 'lxml')

    result={}
    baseAddr1 = '#__next > div.bywovg-1.sXmSU > div.main-content > div.sc-57oli2-0.comDep.cmc- 
    body-wrapper > div > div:nth-child(1) > div.h7vnx2-1.bFzXgL > table > tbody > '  //head of selector
    
    baseAddr3 = ' > td:nth-child(3) > div > a'  // end of selector

    for i in range(20):
        i+=1
        while i%10 == 0:
            time.sleep(3)
            print('resting...')
            break

        baseAddr2 = 'tr:nth-child(' + str(i) + ')'  // middle of selector, i for the order of coin
        Addr = baseAddr1 + baseAddr2 + baseAddr3  // full selector
        #print(Addr)

        data = soup.select(Addr)
        for item in data:
            result.update({item.get_text(): item.get('href')})

    print(result)

感谢您的帮助!

【问题讨论】:

  • 为什么不使用他们的免费 API?你为自己节省了很多工作。如果你用得好,免费计划也足够了
  • 酷!我没有想到这一点,而是发展了一些基本技能。谢谢!

标签: python scrape coinmarketcap


【解决方案1】:

当您向下滚动页面时,该网站首先显示然后隐藏每一行硬币数据。要触发此行为并在滚动时抓取每一行,您可以使用selenium。为了速度,下面的答案使用了一小段 Javascript,通过selenium 运行,来拉取结果:

from selenium import webdriver
from bs4 import BeautifulSoup as soup
import pandas as pd
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://coinmarketcap.com/')
results = d.execute_script('''
    window.scrollTo(0,document.body.scrollHeight)
    function* get_coin_data(){
        var h = Array.from(document.querySelectorAll('table.h7vnx2-2.bFpGkc.cmc-table thead th'))
        var hds = h.slice(1, h.length-2).map(x => x.textContent)
        for (var i of document.querySelectorAll('table.h7vnx2-2.bFpGkc.cmc-table tbody tr')){
             var n_hds = JSON.parse(JSON.stringify(hds))
             i.scrollIntoView()
             var tds = Array.from(i.querySelectorAll('td'))
             yield Object.fromEntries(tds.slice(1, tds.length-2).map(function(x){
                  return [n_hds.shift(), x.querySelector(':is(.etpvrL, .iworPT, .cLgOOr, .kAXKAX, .hzgCfk, .hykWbK, .kZlTnE)').textContent]
             }));
         }
    }
    return [...get_coin_data()]
''')
df = pd.DataFrame(results)

输出:

      #  24h %    7d %  ...          Name       Price      Volume(24h)
0     1  1.03%   1.05%  ...       Bitcoin  $48,678.16  $29,904,091,891
1     2  0.25%   1.20%  ...      Ethereum   $3,236.58  $15,197,663,099
2     3  0.86%  15.01%  ...       Cardano       $2.82   $6,389,958,677
3     4  1.94%   6.72%  ...  Binance Coin     $483.64   $1,850,753,287
4     5  0.03%   0.04%  ...        Tether       $1.00  $65,270,928,498
..  ...    ...     ...  ...           ...         ...              ...
95   96  2.08%   7.45%  ...      DigiByte    $0.06528      $24,887,122
96   97  2.33%  10.56%  ...       Horizen      $83.24      $57,256,134
97   98  0.06%   0.03%  ...    Pax Dollar     $0.9996      $86,915,502
98   99  0.02%   1.35%  ...      Ontology       $1.07     $123,632,824
99  100  1.34%   0.57%  ...          ICON       $1.40      $56,657,155

[100 rows x 8 columns]

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-02-21
    • 1970-01-01
    • 1970-01-01
    • 2022-06-27
    • 2018-05-23
    相关资源
    最近更新 更多