Wikipedia API——提取内容框[重复]答案

【问题标题】：Wikipedia API -- extracting content box [duplicate]Wikipedia API——提取内容框[重复]
【发布时间】：2014-01-30 06:00:34
【问题描述】：

我正在尝试为http://en.wikipedia.org/wiki/DressBarn 等链接提取灰色框（摘要/信息框）中的信息（灰色框/右列中的信息，例如类型等）。

我正在使用这个http://en.wikipedia.org/w/api.php?action=query&prop=extracts|info&exintro&titles=DressBarn&format=json&redirects&inprop=url&indexpageids——它只返回摘要。

我尝试使用沙盒进行试验，但不知道如何提取灰盒中特别包含的信息。

【问题讨论】：

看看dbpedia.org，例如。 G。 live.dbpedia.org/page/DressBarn.
Getting the infobox section of wikipedia 的可能重复项（或者可能是 content of infobox of wikipedia 或 mediawiki api: how to get infobox from a wikipedia article 或 Get all Wikipedia Infobox Templates and all Pages using them 或 others...）
我看到了所有这些重复的问题，但所有只是对 DBPedia 的仅链接答案。我投票决定保持开放状态，因为我认为至少有一些示例代码来说明如何用 DBPedia 准确回答这个特定问题会更好。

标签： php json wikipedia-api

【解决方案1】：

您可以使用PHP Simple HTML DOM Parser。

<?php
//The folder where you uploaded simple_html_dom.php
require_once('/homepages/0/d502303335/htdocs/js/simple_html_dom.php');

//Wikipedia page to parse
$html = file_get_html('https://en.wikipedia.org/wiki/Burger_King');

foreach ( $html->find ( 'table[class=infobox vcard]' ) as $element ) {

    $cells = $element->find('td');

    $i = 0;

    foreach($cells as $cell) {

        $left[$i] = $cell->plaintext;

        if (!(empty($left[$i]))) {

            $i = $i + 1;

        }

    }

    $cells = $element->find('th');

    $i = 0;

    foreach($cells as $cell) {

        $right[$i] = $cell->plaintext;

        if (!(empty($right[$i]))) {

            $i = $i + 1;

        }

    }

    print_r ($right);

    echo "<br><br><br>";

    print_r ($left);

    //If you want to know what kind of industry burger king is
    echo "Burger king is $right[2], $left[2]

}

?>

如果这个答案适合您的需要，请选择它作为最佳答案并投票赞成，因为我花了很多精力。

【讨论】：