如何从 Yahoo! 抓取关键统计数据用 R 融资？ [复制]答案

【问题标题】：How to scrape key statistics from Yahoo! Finance with R? [duplicate]如何从 Yahoo! 抓取关键统计数据用 R 融资？ [复制]
【发布时间】：2019-05-27 13:36:33
【问题描述】：

不幸的是，我还不是一个经验丰富的爬虫。但是，我需要使用 R 从 Yahoo Finance 抓取多只股票的关键统计数据。

我对使用 rvest 包中的 read_html、html_nodes() 和 html_text() 直接从 html 抓取数据有点熟悉。但是，此网页 MSFT 关键统计信息有点复杂，我不确定所有统计信息是否都保存在 XHR、JS 或 Doc 中。我猜数据存储在 JSON 中。

如果有人知道用 R 提取和解析此网页的数据的好方法，请回答我的问题，提前非常感谢！

或者如果有更方便的方法通过 quantmod 或 Quandl 提取这些指标，请告诉我，这将是一个非常好的解决方案！

目标是将票据/符号作为行名/行标签，而将统计信息标识为列。可以在此 Finviz 链接中找到我的需求说明：

https://finviz.com/screener.ashx

我想抓取 Yahoo Finance 数据的原因是因为 Yahoo 还考虑了 Enterprise、EBITDA 关键统计数据。

编辑：我的意思是参考关键统计页面..例如..：https://finance.yahoo.com/quote/MSFT/key-statistics/。该代码应导致一个数据框行股票代码和关键统计数据列。

【问题讨论】：

可以帮助stackoverflow.com/questions/40245464/…
@NColl 我之前确实考虑过这个话题。然而，最重要的答案与抓取 Finviz 相关..

标签： r web-scraping rvest quantmod quandl

【解决方案1】：

代码

library(rvest)
library(tidyverse)

# Define stock name
stock <- "MSFT"

# Extract and transform data
df <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>% 
    read_html() %>% 
    html_table() %>% 
    map_df(bind_cols) %>% 
    # Transpose
    t() %>%
    as_tibble()

# Set first row as column names
colnames(df) <- df[1,]
# Remove first row
df <- df[-1,]
# Add stock name column
df$Stock_Name <- stock

结果

  Revenue `Total Revenue` `Cost of Revenu… `Gross Profit`
  <chr>   <chr>           <chr>            <chr>         
1 6/30/2… 110,360,000     38,353,000       72,007,000    
2 6/30/2… 96,571,000      33,850,000       62,721,000    
3 6/30/2… 91,154,000      32,780,000       58,374,000    
4 6/30/2… 93,580,000      33,038,000       60,542,000    
# ... with 25 more variables: ...

编辑：
或者，为方便起见，作为一个函数：

get_yahoo <- function(stock){
  # Extract and transform data
  x <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>% 
    read_html() %>% 
    html_table() %>% 
    map_df(bind_cols) %>% 
    # Transpose
    t() %>%
    as_tibble()

  # Set first row as column names
  colnames(x) <- x[1,]
  # Remove first row
  x <- x[-1,]
  # Add stock name column
  x$Stock_Name <- stock

  return(x)
}

用法：get_yahoo(stock)

【讨论】：

非常感谢！但是，我的意思是参考关键统计页面.. finance.yahoo.com/quote/MSFT/key-statistics 。该代码应导致一个数据框行股票代码和关键统计数据列。
好吧，您只需更改 URL 即可获得您想要的结果。你试过运行它吗？您需要一些帮助来理解代码吗？

【解决方案2】：

我希望这就是你要找的东西：

library(quantmod)
library(plyr)

what_metrics <- yahooQF(c("Price/Sales", 
                          "P/E Ratio",
                          "Price/EPS Estimate Next Year",
                          "PEG Ratio",
                          "Dividend Yield", 
                          "Market Capitalization"))

Symbols<-c("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE","T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")


metrics <- getQuote(paste(Symbols, sep="", collapse=";"), what=what_metrics)

获取指标列表

yahooQF()

【讨论】：

【解决方案3】：

您可以使用 lapply 获得多个 pirce

library(quantmod) 

Symbols<-c("XOM","MSFT","JNJ","GE","CVX","WFC","PG","JPM","VZ","PFE","T","IBM","MRK","BAC","DIS","ORCL","PM","INTC","SLB")

StartDate <- as.Date('2015-01-01')

Stocks <-  lapply(Symbols, function(sym) {
  Cl(na.omit(getSymbols(sym, from=StartDate, auto.assign=FALSE)))
})

Stocks <- do.call(merge, Stocks)

在这种情况下，我在函数 Cl() 中查看收盘价

【讨论】：

非常感谢！但是，我的意思是参考关键统计页面finance.yahoo.com/quote/MSFT/key-statistics