【问题标题】:Is there an API for the Google Answer Boxes?是否有适用于 Google 答案框的 API?
【发布时间】:2015-10-26 04:37:34
【问题描述】:

Google 答案框(有时称为精选片段、知识卡或实时结果)非常有用。我想提取信息并在我自己的程序中使用它。查看 HTML 代码,它并不像从那里拉出来那么简单。我已经做了很多研究,但我似乎找不到任何支持他们的东西。有谁知道是否有 API(或 Web Search API 的一部分)可以检索从答案框返回的信息?

我在这里看到了答案: google api for glorious info box? ,但提出的解决方案上个月已弃用。

仅举个例子,这是“日本现在几点”的 HTML 代码:

<!--m--><div data-hveid="30">      
<div class="vk_c vk_gy vk_sh card-section _MZc">  
<div class="vk_bk vk_ans">6:37 AM</div> 
<div class="vk_gy vk_sh"> Tuesday, <span class="_Hq">August 4, 2015</span>  
<span class="_Hq"> (GMT+9) </span>  
</div> <span class="vk_gy vk_sh">  Time in Japan  </span> 

这与“东京在哪里”有很大不同:

<!--m-->
<div class="_uX kno-fb-ctx" aria-level="3" role="heading" data-hveid="41" data-ved="0CCkQtwcoATACahUKEwiLjemg8I3HAhUTKYgKHU7jCho">
<div class="_eF" data-tts="answers" data-tts-text="Japan">Japan</div>
<div class="_Tfc">
</div></div>
<!--n-->
</li><li class="mod" data-md="61" style="clear:none">
<!--m-->
<div class="_oDd" data-hveid="42">
<span class="_Tgc _y9e">Tokyo consists of the southwestern part of the Kanto region, the <b>Izu Islands</b>, and the <b>Ogasawara Islands</b>. Tokyo is the capital of <b>Japan</b>, and the place where over 13 million people live, making it one of the most populous cities in the world.</span></div>

我基本上需要从第一个中提取“6:37 AM”,从第二个中提取“Japan”,但是执行动态字符串搜索会很困难,因为它们的格式非常不同。

【问题讨论】:

  • 我和你一样好奇,但现在我正在探索 DuckDuckGo 的可能性,因为它们具有相似的功能:duckduckgo.com/api

标签: html google-api


【解决方案1】:

我过去曾使用过 DuckDuckGo 提供的即时答案 api,效果非常好。响应不如谷歌的强大,但这是一个好的开始。

https://duckduckgo.com/api

API 在 JSON 响应中看起来像这样。

{
Abstract: ""
AbstractText: ""
AbstractSource: ""
AbstractURL: ""
Image: ""
Heading: ""
Answer: ""
Redirect: ""
AnswerType: ""
Definition: ""
DefinitionSource: ""
DefinitionURL: ""
RelatedTopics: [ ]
Results: [ ]
Type: ""
}

我希望这会有所帮助!

【讨论】:

    【解决方案2】:

    我进行了很多研究,但目前似乎没有您所描述的任何可用的东西。也没有任何东西可以从 Google 搜索中提取信息。

    我唯一能想到的替代方法是通过 RSS (http://www.w3schools.com/xml/xml_rss.asp) 获取信息并以某种方式在程序中实现。

    【讨论】:

      【解决方案3】:

      有点晚了,但这是 2017 年的一个有效解决方案,它使用 Python 和 Selenium(带有无头 chromedriver)从答案框中提取“主要”文本,基于搜索页面的格式和答案框在不同类型的查询中相当一致(尽管我没有对此进行详尽的测试)。当然,元素坐标可能会根据分辨率/窗口大小而变化,但调整起来很容易。

      from selenium import webdriver
      from selenium.webdriver.common.keys import Keys
      from selenium.webdriver.chrome.options import Options
      
      chrome_options = Options()
      chrome_options.add_argument("--window-size=1024x768")
      chrome_options.add_argument("--headless")
      driver = webdriver.Chrome(chrome_options=chrome_options)
      
      def ask_google(query):
      
          # Search for query
          query = query.replace(' ', '+')
      
          driver.get('http://www.google.com/search?q=' + query)
      
          # Get text from Google answer box
      
          answer = driver.execute_script(
                  "return document.elementFromPoint(arguments[0], arguments[1]);",
                  350, 230).text
      
          return answer
      

      用您的查询(或接近查询)测试这种方法会产生:

      ask_google("what is the time in Japan")
      
      "4:36 PM"
      
      ask_google("where is tokyo located in japan")
      
      "Situated on the Kanto Plain, Tokyo is one of three large cities, the other two being Yokohama and Kawasaki, located along the northwestern shore of Tokyo Bay, an inlet of the Pacific Ocean on east-central Honshu, the largest of the islands of Japan."
      

      【讨论】:

      • 如果您的脚本字符串中的 return 后面有换行符,这将不起作用。
      • 在 2021 年运行它会给我一个错误:'WebDriverException:消息:'chromedriver' 可执行文件需要在 PATH 中。请参阅sites.google.com/a/chromium.org/chromedriver/home'。有人明白吗?
      • 如果您使用的是 Mac,可以运行 brew install --cask chromedriver 来解决 chromedriver PATH 问题。
      【解决方案4】:

      SerpApi 支持直接回答框。它似乎也支持时间:

      $ curl https://serpapi.com/search.json?q=time+in+japan
      
      ...
      "answer_box": {
        "type": "local_time",
        "result": "4:37 AM"
      },
      ....
      

      一些文档:https://serpapi.com/direct-answer-box-api

      【讨论】:

      • 50 美元一个月太贵了。
      【解决方案5】:

      我创建了一个函数,它可以抓取 google 客户端以从 google 的快速答案框中获取答案。显然它并不完美,但效果很好!

      async function answer(q) {
        var html = await fetch(
          `https://cors.explosionscratc.repl.co/google.com/search?q=${encodeURI(q)}`,
          {
            headers: {
              "User-Agent":
                "Mozilla/5.0 (X11; CrOS x86_64 13982.88.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.162 Safari/537.36",
            },
          }
        ).then((res) => res.text());
        window.d = new DOMParser().parseFromString(html, "text/html");
        var el =
          d.querySelector("[id*='lrtl-translation-text']") ||
          [...d.querySelectorAll(".kp-header [data-md]")][1] ||
          //Calculator results
          [...document.querySelectorAll(".kCrYT")]?.[1] ||
          [...d.querySelectorAll("*")]
            .filter((i) => i.innerText)
            .filter((i) => i.innerText.includes("Calculator Result"))
            .slice(-2)?.[0]
            ?.innerText?.split("\n")?.[2] ||
          //Snippets
          [...d.querySelectorAll("*")]
            .filter((i) => i.innerText)
            .filter(
              (i) =>
                i.innerText.includes("Featured snippet from the web") ||
                i.innerText.includes("Description") ||
                i.innerText.includes("Calculator result")
            )
            .slice(-1)?.[0]
            ?.parentElement.querySelector("div span") ||
          //Cards (like at the side)
          d.querySelector(
            ".card-section, [class*='__wholepage-card'] [class*='desc']"
          ) ||
          d.querySelector(".thODed")?.querySelector("div span") ||
          [...d.querySelectorAll("[data-async-token]")]?.slice(-1)?.[0] ||
          d.querySelector("miniapps-card-header")?.parentElement ||
          d.querySelector("#tw-target");
        var text = el?.innerText?.trim();
        if (text.includes("translation") && text.includes("Google Translate")) {
          text = text.split("Verified")[0].trim();
        }
        if (
          text.includes("Calculator Result") &&
          text.includes("Your calculations and results")
        ) {
          text = text
            .split("them")?.[1]
            .split("(function()")?.[0]
            ?.split("=")?.[1]
            ?.trim();
        }
        return text;
      }
      

      这会抓取 google 搜索页面,然后解析 HTML 以获取答案:

      await answer("When were antibiotics discovered");
      // "But it was not until 1928 that penicillin, the first true antibiotic, was discovered by Alexander Fleming, Professor of Bacteriology at St. Mary's Hospital in London."
      
      await answer("What time is it in London");
      // "4:44 PM"
      
      await answer("define awesome");
      //"extremely impressive or daunting; inspiring great admiration, apprehension, or fear."
      

      document.querySelector("button").onclick = () => {  answer(document.querySelector("input").value).then(console.log);
      }
      
      async function answer(q) {
        var html = await fetch(
          `https://cors.explosionscratc.repl.co/google.com/search?q=${encodeURI(q)}`,
          {
            headers: {
              "User-Agent":
                "Mozilla/5.0 (X11; CrOS x86_64 13982.88.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.162 Safari/537.36",
            },
          }
        ).then((res) => res.text());
        window.d = new DOMParser().parseFromString(html, "text/html");
        var el =
          d.querySelector("[id*='lrtl-translation-text']") ||
          [...d.querySelectorAll(".kp-header [data-md]")][1] ||
          //Calculator results
          [...document.querySelectorAll(".kCrYT")]?.[1] ||
          [...d.querySelectorAll("*")]
            .filter((i) => i.innerText)
            .filter((i) => i.innerText.includes("Calculator Result"))
            .slice(-2)?.[0]
            ?.innerText?.split("\n")?.[2] ||
          //Snippets
          [...d.querySelectorAll("*")]
            .filter((i) => i.innerText)
            .filter(
              (i) =>
                i.innerText.includes("Featured snippet from the web") ||
                i.innerText.includes("Description") ||
                i.innerText.includes("Calculator result")
            )
            .slice(-1)?.[0]
            ?.parentElement.querySelector("div span") ||
          //Cards (like at the side)
          d.querySelector(
            ".card-section, [class*='__wholepage-card'] [class*='desc']"
          ) ||
          d.querySelector(".thODed")?.querySelector("div span") ||
          [...d.querySelectorAll("[data-async-token]")]?.slice(-1)?.[0] ||
          d.querySelector("miniapps-card-header")?.parentElement ||
          d.querySelector("#tw-target");
        var text = el?.innerText?.trim();
        if (text.includes("translation") && text.includes("Google Translate")) {
          text = text.split("Verified")[0].trim();
        }
        if (
          text.includes("Calculator Result") &&
          text.includes("Your calculations and results")
        ) {
          text = text
            .split("them")?.[1]
            .split("(function()")?.[0]
            ?.split("=")?.[1]
            ?.trim();
        }
        return text;
      }
      &lt;input placeholder="What do you want to search?"&gt;&lt;button&gt;Search!&lt;/button&gt;

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-03-29
        • 2017-12-20
        • 1970-01-01
        • 1970-01-01
        • 2017-12-14
        相关资源
        最近更新 更多