【问题标题】:Yahoo Finance Historical data downloader url is not working雅虎财经历史数据下载器网址无效
【发布时间】:2017-05-18 09:45:05
【问题描述】:

我使用以下网址从雅虎财经获取历史数据。从 2017 年 5 月 16 日起,该网址已失效。

http://real-chart.finance.yahoo.com/table.csv?s=AAL&a=04&b=01&c=2017&d=04&e=02&f=2017&g=d&ignore=.csv

好像他们改变了网址,新的网址是:

https://query1.finance.yahoo.com/v7/finance/download/AAL?period1=1494873000&period2=1494959400&interval=1d&events=history&crumb=l0aEtuOKocj

在上面更改的 URL 中有一个会话 cookie,它是 crumb。是否知道如何以编程方式(在 JAVA 中)获取此 cookie?

【问题讨论】:

  • 第 1 期和第 2 期号码是如何创建的,它们的含义是什么?

标签: cookies session-cookies finance yahoo-api yahoo-finance


【解决方案1】:

我最近编写了一个简单的 Python 脚本来下载单个股票的历史记录。
这是一个如何调用它的示例:
python get_quote_history.py --symbol=IBM --from=2017-01-01 --to=2017-05-25 -o IBM.csv
这将下载 2017-01-01 至 2017-05-25 的 IBM 历史价格并将其保存在 IBM.csv 文件中。

import re
import urllib2
import calendar
import datetime
import getopt
import sys
import time

crumble_link = 'https://finance.yahoo.com/quote/{0}/history?p={0}'
crumble_regex = r'CrumbStore":{"crumb":"(.*?)"}'
cookie_regex = r'Set-Cookie: (.*?); '
quote_link = 'https://query1.finance.yahoo.com/v7/finance/download/{}?period1={}&period2={}&interval=1d&events=history&crumb={}'


def get_crumble_and_cookie(symbol):
    link = crumble_link.format(symbol)
    response = urllib2.urlopen(link)
    match = re.search(cookie_regex, str(response.info()))
    cookie_str = match.group(1)
    text = response.read()
    match = re.search(crumble_regex, text)
    crumble_str = match.group(1)
    return crumble_str, cookie_str


def download_quote(symbol, date_from, date_to):
    time_stamp_from = calendar.timegm(datetime.datetime.strptime(date_from, "%Y-%m-%d").timetuple())
    time_stamp_to = calendar.timegm(datetime.datetime.strptime(date_to, "%Y-%m-%d").timetuple())

    attempts = 0
    while attempts < 5:
        crumble_str, cookie_str = get_crumble_and_cookie(symbol)
        link = quote_link.format(symbol, time_stamp_from, time_stamp_to, crumble_str)
        #print link
        r = urllib2.Request(link, headers={'Cookie': cookie_str})

        try:
            response = urllib2.urlopen(r)
            text = response.read()
            print "{} downloaded".format(symbol)
            return text
        except urllib2.URLError:
            print "{} failed at attempt # {}".format(symbol, attempts)
            attempts += 1
            time.sleep(2*attempts)
    return ""

if __name__ == '__main__':
    print get_crumble_and_cookie('KO')
    from_arg = "from"
    to_arg = "to"
    symbol_arg = "symbol"
    output_arg = "o"
    opt_list = (from_arg+"=", to_arg+"=", symbol_arg+"=")
    try:
        options, args = getopt.getopt(sys.argv[1:],output_arg+":",opt_list)
    except getopt.GetoptError as err:
        print err

    for opt, value in options:
        if opt[2:] == from_arg:
            from_val = value
        elif opt[2:] == to_arg:
            to_val = value
        elif opt[2:] == symbol_arg:
            symbol_val = value
        elif opt[1:] == output_arg:
            output_val = value

    print "downloading {}".format(symbol_val)
    text = download_quote(symbol_val, from_val, to_val)

    with open(output_val, 'wb') as f:
        f.write(text)
    print "{} written to {}".format(symbol_val, output_val)

【讨论】:

    【解决方案2】:

    Andrea Galeazzi 的出色回答;添加了拆分和股息选项,并为 python 3 扭曲。

    还进行了更改,因此“to:date”包含在返回的结果中,之前的代码返回到但不包括“to:date”。只是不一样!

    请注意,雅虎在价格四舍五入、列顺序和拆分语法方面进行了细微更改。

    ## Downloaded from
    ## https://stackoverflow.com/questions/44044263/yahoo-finance-historical-data-downloader-url-is-not-working
    ## Modified for Python 3
    ## Added --event=history|div|split   default = history
    ## changed so "to:date" is included in the returned results
    ## usage: download_quote(symbol, date_from, date_to, events).decode('utf-8')
    
    import re
    from urllib.request import urlopen, Request, URLError
    import calendar
    import datetime
    import getopt
    import sys
    import time
    
    crumble_link = 'https://finance.yahoo.com/quote/{0}/history?p={0}'
    crumble_regex = r'CrumbStore":{"crumb":"(.*?)"}'
    cookie_regex = r'Set-Cookie: (.*?); '
    quote_link = 'https://query1.finance.yahoo.com/v7/finance/download/{}?period1={}&period2={}&interval=1d&events={}&crumb={}'
    
    
    def get_crumble_and_cookie(symbol):
        link = crumble_link.format(symbol)
        response = urlopen(link)
        match = re.search(cookie_regex, str(response.info()))
        cookie_str = match.group(1)
        text = response.read().decode("utf-8")
        match = re.search(crumble_regex, text)
        crumble_str = match.group(1)
        return crumble_str , cookie_str
    
    
    def download_quote(symbol, date_from, date_to,events):
        time_stamp_from = calendar.timegm(datetime.datetime.strptime(date_from, "%Y-%m-%d").timetuple())
        next_day = datetime.datetime.strptime(date_to, "%Y-%m-%d") + datetime.timedelta(days=1)
        time_stamp_to = calendar.timegm(next_day.timetuple())
    
        attempts = 0
        while attempts < 5:
            crumble_str, cookie_str = get_crumble_and_cookie(symbol)
            link = quote_link.format(symbol, time_stamp_from, time_stamp_to, events,crumble_str)
            #print link
            r = Request(link, headers={'Cookie': cookie_str})
    
            try:
                response = urlopen(r)
                text = response.read()
                print ("{} downloaded".format(symbol))
                return text
            except URLError:
                print ("{} failed at attempt # {}".format(symbol, attempts))
                attempts += 1
                time.sleep(2*attempts)
        return b''
    
    if __name__ == '__main__':
        print (get_crumble_and_cookie('KO'))
        from_arg = "from"
        to_arg = "to"
        symbol_arg = "symbol"
        event_arg = "event"
        output_arg = "o"
        opt_list = (from_arg+"=", to_arg+"=", symbol_arg+"=", event_arg+"=")
        try:
            options, args = getopt.getopt(sys.argv[1:],output_arg+":",opt_list)
        except getopt.GetoptError as err:
            print (err)
    
        symbol_val = ""
        from_val = ""
        to_val = ""
        output_val = ""
        event_val = "history"
        for opt, value in options:
            if opt[2:] == from_arg:
                from_val = value
            elif opt[2:] == to_arg:
                to_val = value
            elif opt[2:] == symbol_arg:
                symbol_val = value
            elif opt[2:] == event_arg:
                event_val = value
            elif opt[1:] == output_arg:
                output_val = value
    
        print ("downloading {}".format(symbol_val))
        text = download_quote(symbol_val, from_val, to_val,event_val)
        if text:
            with open(output_val, 'wb') as f:
                f.write(text)
            print ("{} written to {}".format(symbol_val, output_val))
    

    【讨论】:

    • 伙计,你成就了我的一天!您(和 Andrea 的)脚本甚至可以下载无法通过 YAHOO 的 Web 界面访问的乐器历史!
    • 2017 年 8 月早些时候的某个时候,雅虎在一个响应中更改了单个字符的大小写。在上面的代码中更改行 >>>cookie_regex = r'Set-Cookie: (.*?); '>> cookie_regex = r'set-Cookie: (.*?); '
    • cookie_regex 行现在应该是 cookie_regex = r'set-cookie: (.*?); '
    【解决方案3】:

    让它工作,现在我只需要解析 csv。由于语法有问题,我想我会分享。

    Dim crumb As String:    crumb = "xxxx"
    Dim cookie As String:   cookie = "yyyy"
    
    Dim urlStock As String: urlStock = "https://query1.finance.yahoo.com/v7/finance/download/SIRI?" & _
        "period1=1274158800&" & _
        "period2=1495059477&" & _
        "interval=1d&events=history&crumb=" & crumb
    
    Dim http As MSXML2.XMLHTTP:   Set http = New MSXML2.ServerXMLHTTP
    http.Open "GET", urlStock, False
    http.setRequestHeader "Cookie", cookie
    http.send
    

    【讨论】:

    • fyi,您只需要 B cookie,其余无关紧要。另外,如果我是你,我会从你的帖子中删除 crumb/cookie 对,因为现在任何人都可以作为你的会话进行身份验证;)
    • 啊,谢谢。如您所见,我是财务人员,而不是开发人员:)
    • 动态获取 cookie + crumb 见:stackoverflow.com/a/44050039/2279831
    【解决方案4】:

    您可以在 Chrome 中手动保存 crumb/cookie 对,也可以使用 this 之类的东西来生成它。然后,只需在java中设置cookie头,并在URL中传递对应的crumb

    【讨论】:

    • 知道如何使用 evaluateJavaScript:completionHandler: 在可可应用程序中获取面包屑和 cookie
    • AFAIK,一旦你有一个有效的面包屑和 cookie,你就可以无限制地重复使用它们——它们不会过期。
    • 非常有趣的线索.. 我想知道 cookie 和 crumb 令牌是否可以从 Java 生成?然后组装一个新的 Java API(按照原始帖子中的要求)将是直截了当的。
    【解决方案5】:

    我编写了一个轻量级脚本,该脚本汇集了该线程中的许多建议来解决此问题。 https://github.com/AndrewRPorter/yahoo-historical

    但是,还有更好的解决方案,例如 https://github.com/ranaroussi/fix-yahoo-finance

    希望这些资源有所帮助!

    【讨论】:

      【解决方案6】:

      我在 Excel/VBA 中针对此问题开发了以下解决方案。关键挑战是创建 Crumb / Cookie 对。创建后,您可以将其重新用于调用 Yahoo 以获取历史价格。

      在此处查看 Crumb / Cookie 的关键代码

      Sub GetYahooRequest(strCrumb As String, strCookie As String)
      'This routine will use a sample request to Yahoo to obtain a valid Cookie and Crumb
      
      Dim strUrl                      As String: strUrl = "https://finance.yahoo.com/lookup?s=%7B0%7D"  
      Dim objRequest                  As WinHttp.WinHttpRequest
      
      Set objRequest = New WinHttp.WinHttpRequest
      
          With objRequest
              .Open "GET", strUrl, True
              .setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
              .send
              .waitForResponse
              strCrumb = strExtractCrumb(.responseText)
              strCookie = Split(.getResponseHeader("Set-Cookie"), ";")(0)
          End With
      
      End Sub
      

      请参阅我网站上的以下Yahoo Historical Price Extract,以获取演示如何提取 Yahoo 历史价格的示例 Excel 工作簿

      【讨论】:

        【解决方案7】:

        很好的回答安德里亚,我已在您的代码中添加了允许下载多只股票的功能。 (python 2.7)

        file1: down.py

        import os
        
        myfile = open("ticker.csv", "r")
        lines = myfile.readlines()
        
        for line in lines:
                ticker = line.strip();
                cmd = "python get_quote_history.py --symbol=%s --from=2017-01-01 --to=2017-05-25 -o %s.csv"  %(ticker,ticker)
                os.system(cmd)
        

        文件2:ticker.csv 苹果 微软公司

        file3:get_quote_history.py

        【讨论】:

        • 在哪里可以找到所有这些文件?是否在其他地方存储了一些东西?作者给出的代码也不起作用。 Andrea Galeazzi 的出色回答;添加了拆分和股息选项,并为 python 3 扭曲。
        猜你喜欢
        • 1970-01-01
        • 2022-12-05
        • 2019-09-12
        • 2021-06-03
        • 2023-03-16
        • 1970-01-01
        • 1970-01-01
        • 2022-06-16
        • 2015-05-07
        相关资源
        最近更新 更多