【问题标题】:Scraping front page Coinmarketcap into a dataframe将首页 Coinmarketcap 抓取到数据框中
【发布时间】:2021-12-23 19:43:09
【问题描述】:

您好,我想要的是能够将 Coinmarketcap 的首页放入数据框中。这是我目前得到的,但数据看起来没有条理,我不知道如何制作一个整洁的 df。

library(jsonlite)
library ( tidyverse)
library( rvest )

# lets get what is marketcap today. 
json_data <- read_html(c ( 'https://coinmarketcap.com/')) %>%
  html_node("#__NEXT_DATA__") %>% 
  html_text() %>% 
  fromJSON()
    
    json_data$props$initialState$cryptocurrency$listingLatest$data 

我最终得到的是一个我无法理解的长列表。我知道它在那里,因为列表看起来像这样,但我不知道如何解析它。

121] "quotes.2.percentChange60d"          "quotes.2.percentChange7d"           "quotes.2.percentChange90d"          "quotes.2.price"                    
[125] "quotes.2.selfReportedMarketCap"     "quotes.2.turnover"                  "quotes.2.volume24h"                 "quotes.2.volume30d"                
[129] "quotes.2.volume7d"                  "quotes.2.ytdPriceChangePercentage"  "rank"                               "selfReportedCirculatingSupply"     
[133] "slug"                               "symbol"                             "totalSupply"                        "tvl"                               

[[1]]$id
[1] "COMPRESSED_KEYS_ARR"

[[1]]$excludeProps
[1] "auditInfoList"


[[2]]
  [1] "68789.6259389221"         "65.5260009765625"         "18908943"                 "1"                        "2013-04-28T00:00:00.000Z"
  [6] "TRUE"                     "FALSE"                    "50755.7211665326"         "1"                        "1"                       
 [11] "FALSE"                    "2021-12-23T19:20:02.000Z" "48065.8375264037"         "8093"                     "21000000"                
 [16] "Bitcoin"                  "40.4175"                  "1065349214847.34"         "2021-12-23T19:21:02.000Z" "18897342.6115399"        
 [21] "18897342.6115399"         "BTC"                      "0"                        "0"                        "0"                       
 [26] "0"                        "0"                        "0"                        "1"                        "0"                       
 [31] "0.02793205"               "527841.47774037"          "21776428.8780472"         "3626419.86588612"         "72.706"                  
 [36] "40.4175"                  "1065349214847.34"         "2021-12-23T19:21:02.000Z" "232885004.198773"         "232885004.198773"        
 [41] "ETH"                      "-0.189131"                "0.653349"                 "-11.42415087"             "-16.02722155"            
 [46] "3.129837"                 "19.93155879"              "12.31613021"              "0"                        "0.02793205"              
 [51] "6504955.07684694"         "268365972.663341"         "44690876.5456617"         "72.706"                   "40.4175"                 
 [56] "1065349214847.34"         "2021-12-23T19:20:02.000Z" "959267979935.385"         "959267979935.385"         "USD"                     
 [61] "0.53649283"               "3.98091259"               "-11.42415087"             "-16.02722155"             "5.84148872"              
 [66] "19.93155879"              "50730.9149927304"         "0"                        "0.02793205"               "26794319100.1314"        
 [71] "1105416320667.99"         "184084531389.181"         "72.706"                   "40.4175"                  "1065349214847.34"        
 [76] "2021-12-23T19:21:02.000Z" "18897342.6115399"         "18897342.6115399"         "BTC"                      "0"                       
 [81] "0"                        "0"                        "0"                        "0"                        "0"                       
 [86] "1"                        "0"                        "0.02793205"               "527841.47774037"          "21776428.8780472"        
 [91] "3626419.86588612"         "72.706"                   "40.4175"                  "1065349214847.34"         "2021-12-23T19:21:02.000Z"
 [96] "232885004.198773"         "232885004.198773"         "ETH"                      "-0.189131"                "0.653349"                
[101] "-11.42415087"             "-16.02722155"             "3.129837"                 "19.93155879"              "12.31613021"             
[106] "0"                        "0.02793205"               "6504955.07684694"         "268365972.663341"         "44690876.5456617"        
[111] "72.706"                   "40.4175"                  "1065349214847.34"         "2021-12-23T19:20:02.000Z" "959267979935.385"        
[116] "959267979935.385"         "USD"                      "0.53649283"               "3.98091259"               "-11.42415087"            
[121] "-16.02722155"             "5.84148872"               "19.93155879"              "50730.9149927304"         "0"                       
[126] "0.02793205"               "26794319100.1314"         "1105416320667.99"         "184084531389.181"         "72.706"                  
[131] "1"                        "0"                        "bitcoin"                  "BTC"                      "18908943"                
[136] NA                         NA                        

[[3]]
  [1] "4891.70469755141"         "0.420897006988525"        "118860687.6865"           "2"                        "2015-08-07T00:00:00.000Z"
  [6] "TRUE"                     "FALSE"                    "4119.08504574469"         "1027"                     "1"                       
 [11] "FALSE"                    "2021-12-23T19:20:02.000Z" "3897.23447281111"         "4509"                     NA                        
 [16] "Ethereum"                 "20.6197"                  "489234090606.33"          "2021-12-23T19:21:02.000Z" "9637790.92058901"        
 [21] "9637790.92058901"         "BTC"                      "0.277187"                 "-0.842643"                "-4.49917037"

我最终想要的是如何使用历史数据进行检索。

json_data <- read_html("https://coinmarketcap.com/historical/20150621/") %>%
  html_node("#__NEXT_DATA__") %>% 
  html_text() %>% 
  fromJSON()

df_data <- json_data$props$initialState$cryptocurrency$listingHistorical$data
> head ( df_data )
   id      name symbol      slug num_market_pairs               date_added     tags   max_supply circulating_supply total_supply platform.id
1   1   Bitcoin    BTC   bitcoin               NA 2013-04-28T00:00:00.000Z mineable     21000000           14298800     14298800          NA
2  52       XRP    XRP       xrp               NA 2013-08-04T00:00:00.000Z          100000000000        31908551587  99998976018          NA
3   2  Litecoin    LTC  litecoin               NA 2013-04-28T00:00:00.000Z mineable     84000000           40119404     40119404          NA
4  74  Dogecoin   DOGE  dogecoin               NA 2013-12-15T00:00:00.000Z mineable           NA        99890370337  99890370337          NA
5 463 BitShares    BTS bitshares               NA 2014-07-21T00:00:00.000Z            3600570502         2511953117   2511953117          NA
6 512   Stellar    XLM   stellar               NA 2014-08-05T00:00:00.000Z                    NA         4837354256 100804167862          NA

【问题讨论】:

  • 你考虑过使用 CoinmarketCap 的 API 吗?它非常强大..coinmarketcap.com/api/documentation/v1/#section/…
  • 或者如果您需要帮助解决这个问题,请包含运行代码所需的包,以便我们重复您的经验
  • @sconfluentus 是的,对不起,我没有包含这些包,我还注意到我使用了不必要的胶水,所以我已经编辑了代码并添加了所需的包。

标签: r web-scraping


【解决方案1】:

使用html_table:

library(jsonlite)
library ( tidyverse)
library( rvest )

# lets get what is marketcap today. 
json_data <- read_html(c ( 'https://coinmarketcap.com/'))  %>%  
  html_nodes("table")  %>% html_table(fill=T)

它返回一个表格。

> json_data 
[[1]]
# A tibble: 100 x 11
   ``      `#` Name      Price   `24h %` `7d %` `Market Cap`   `Volume(24h)`    
   <lgl> <int> <chr>     <chr>   <chr>   <chr>  <chr>          <chr>            
 1 NA        1 Bitcoin1~ $50,77~ 3.61%   5.53%  $960.18B$960,~ $28,207,384,9685~
 2 NA        2 Ethereum~ $4,104~ 2.18%   1.88%  $487.89B$487,~ $17,920,397,7984~
 3 NA        3 Binance ~ $548.65 1.94%   2.67%  $91.52B$91,51~ $1,860,150,3053,~
 4 NA        4 Tether4U~ $1.00   0.04%   0.01%  $77.38B$77,38~ $68,556,169,0906~
 5 NA        5 Solana5S~ $189.82 4.75%   3.83%  $58.55B$58,55~ $2,144,421,38811~
 6 NA        6 Cardano6~ $1.47   8.69%   15.79% $49.08B$49,07~ $1,964,583,1431,~
 7 NA        7 XRP7XRP   $1.01   4.26%   23.58% $47.82B$47,81~ $4,211,885,8344,~
 8 NA        8 USD Coin~ $1.00   0.05%   0.05%  $42.57B$42,57~ $4,039,920,4424,~
 9 NA        9 Terra9LU~ $92.66  3.16%   37.30% $34.02B$34,02~ $4,141,070,96044~
10 NA       10 Avalanch~ $122.16 1.27%   17.38% $29.71B$29,70~ $1,291,116,76510~
# ... with 90 more rows, and 3 more variables: Circulating Supply <chr>,
#   Last 7 Days <lgl>,  <lgl>

【讨论】:

  • 谢谢,我试过了,但问题是它动态加载表格并因此将其切断。 read_html(c ( 'https://coinmarketcap.com/')) %&gt;% html_nodes("table") %&gt;% html_table(fill=T) %&gt;% data.frame()
  • 您可以使用循环将页码传递给 URL https://coinmarketcap.com/?page=2
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-02-21
  • 1970-01-01
  • 2017-12-22
  • 1970-01-01
  • 2021-01-13
  • 2019-05-16
  • 2022-01-02
相关资源
最近更新 更多