【发布时间】:2015-10-06 03:32:38
【问题描述】:
我正在尝试从网站中提取主表,将其转换为 JSON,但我想要的表之前的表阻碍了我正在使用的代码。我正在使用的代码:
<?php
$singles_chart_url = 'http://www.mediabase.com/mmrweb/allaboutcountry/Charts.asp?format=C1R';
// Get the mode from the user:
$mode = $_GET['chart'];
// This is an array of elements to remove from the content before stripping it:
$newlines = array("\t", "\n", "\r", "\x20\x20", "\0", "\x0B");
switch($mode)
{
// They want the Singles chart, or haven't specified what they want:
case 'singles':
case '':
default:
$content = file_get_contents($singles_chart_url);
$start_search = '<table width="100%" border="0" cellpadding="2" cellspacing="2">';
break;
}
$content = str_replace($newlines, "", html_entity_decode($content));
$scrape_start = strpos($content, $start_search);
$scrape_end = strpos($content, '</table>', $scrape_start);
$the_table = substr($content, $scrape_start, ($scrape_end - $scrape_start));
// Now loop through the rows and get the data we need:
preg_match_all("|<tr(.*)</tr>|U", $the_table, $rows);
// Set the heading so we can output nice XML:
switch($_REQUEST['format'])
{
case 'json':
default:
header('Content-type: application/json');
$count = 0;
foreach($rows[0] as $row)
{
// Check it's OK:
if(!strpos($row, '<th'))
{
// Get the cells:
preg_match_all("|<td(.*)</td>|U", $row, $cells);
$cells = $cells[0];
$position = strip_tags($cells[0]);
$plus = strip_tags($cells[1]);
$artist = strip_tags($cells[2]);
$weeks = strip_tags($cells[3]);
echo "\n\t\t" . '{';
echo "\n\t\t\t" . '"position" : "' . $position . '", ';
echo "\n\t\t\t" . '"plus" : "' . $plus . '", ';
echo "\n\t\t\t" . '"artist" : "' . $artist . '", ';
echo "\n\t\t\t" . '"noWeeks" : "' . $weeks . '" ';
echo ($count != (count($rows[0]) - 2)) ? "\n\t\t" . '}, ' : "\n\t\t" . '}';
$count++;
}
}
echo "\n\t" . ']';
echo "\n" . '}';
break;
}?>
website 我正在尝试抓取。目标是检索LW、TW、Artist、Title等之后开始的表的json结果。以上返回:
{
"chartDate" : "",
"retrieved" : "1444101246",
"entries" :
[
{
"position" : "7 DayCharts",
"plus" : "Country Past 7 Days -by Overall Rank Return to Main Menu ",
"artist" : " ",
"noWeeks" : "",
"peak" : "",
"points" : "",
"increase" : "",
"us" : ""
},
]
}
而不是
{
"chartDate" : "",
"retrieved" : "1444101246",
"entries" :
[
{
"position" : "2",
"plus" : "1",
"artist" : "KENNY CHESNEY",
"noWeeks" : "Save It For A Rainy"", etc . etc.
},
]
}
我可以在上面的代码中添加什么来检索该表?
【问题讨论】:
-
@PaulCrovella 嘿,谢谢。我是 php 新手,我希望我能理解所有这些,但我会看看。
标签: php json web-scraping