【发布时间】:2018-03-06 13:39:44
【问题描述】:
我必须遍历一个表并以这种方式创建一个包含信息的 json 对象:
var obj = {
vaccine: "...",
year: ...,
country: "...",
coverage: ...
}
表格是:
<table class="ts">
<tr>
<td class="statheadings" colspan="100%" align="center">
<h1 class="statistics">Coverage time series for Italy (ITA)</h1>
</td>
</tr>
<tr>
<td align="center" colspan="100%"> <font color="red">
Last updated 06-Sep-2017 (data as of 05-Sep-2017)<br />Next overall update 2018<br /></font>
</td>
</tr>
<tr>
<td colspan="100%" >
<hr />
</td>
</tr>
<tr>
<th colspan="4" align="left">Vacciness</th>
<th class="year">2016</th>
<th class="year">2015</th>
<th class="year">2014</th>
<th class="year">2013</th>
<th class="year">2012</th>
<th class="year">2011</th>
<th class="year">2010</th>
<th class="year">2009</th>
<th class="year">2008</th>
<th class="year">2007</th>
<th class="year">2006</th>
<th class="year">2005</th>
<th class="year">2004</th>
<th class="year">2003</th>
<th class="year">2002</th>
<th class="year">2001</th>
<th class="year">2000</th>
<th class="year">1999</th>
<th class="year">1998</th>
<th class="year">1997</th>
<th class="year">1996</th>
<th class="year">1995</th>
<th class="year">1994</th>
<th class="year">1993</th>
<th class="year">1992</th>
<th class="year">1991</th>
<th class="year">1990</th>
<th class="year">1989</th>
<th class="year">1988</th>
<th class="year">1987</th>
<th class="year">1986</th>
<th class="year">1985</th>
<th class="year">1984</th>
<th class="year">1983</th>
<th class="year">1982</th>
<th class="year">1981</th>
</tr>
<tr class="odd">
<td colspan="4" align="left">
<a href="timeseries/tscoveragedtp3.html" title="Click for full global time series for DTP3">DTP3</a>
</td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">93 </td>
<td class="statistics_small" colspan="1">95 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">97 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">97 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">95 </td>
<td class="statistics_small" colspan="1">94 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">93 </td>
<td class="statistics_small" colspan="1">93 </td>
<td class="statistics_small" colspan="1">87 </td>
<td class="statistics_small" colspan="1">87 </td>
<td class="statistics_small" colspan="1">86 </td>
<td class="statistics_small" colspan="1">85 </td>
<td class="statistics_small" colspan="1">84 </td>
<td class="statistics_small" colspan="1">84 </td>
<td class="statistics_small" colspan="1">83 </td>
<td class="statistics_small" colspan="1">82 </td>
<td class="statistics_small" colspan="1">80 </td>
<td class="statistics_small" colspan="1">80 </td>
<td class="statistics_small" colspan="1">80 </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<tr class="even">
<td colspan="4" align="left">
<a href="timeseries/tscoveragehepb3.html" title="Click for full global time series for HepB3">HepB3</a>
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
93
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
97
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
94
</td>
<td class="statistics_small" colspan="1">
94
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
50
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<tr class="odd">
<td colspan="4" align="left">
<a href="timeseries/tscoveragedtp3.html" title="Click for full global time series for DTP3">DTP3</a>
</td>
...
如您所见,行分为odd 和even。
我使用Node.js、Express.js 和Cheerio 获取了这张表。
这是我的代码:
const cheerio = require('cheerio');
const express = require('express');
var fs = require('fs');
const request = require('request');
const app = express();
// piece of link of each country
/*var countries = {
'Albania': 'ALB',
'Austria': 'AUT',
'Belgium': 'BEL',
'Bulgaria': 'BGR',
'Croatia': 'HRV',
'Cyprus': 'CYP',
'Denmark': 'DNK',
'Estonia': 'EST',
'Finland': 'FIN',
'France': 'FRA',
'Germany': 'DEU',
'Greece': 'GRC',
'Iceland': 'ISL',
'Ireland': 'IRL',
'Italy': 'ITA',
'Latvia': 'LVA',
'Netherlands': 'NLD',
'Norway': 'NOR',
'Poland': 'POL',
'Portugal': 'PRT',
'Romania': 'ROU',
'Slovakia': 'SVK',
'Slovenia': 'SVN',
'Spain': 'ESP',
'Sweden': 'SWE',
'Switzerland': 'CHE',
'United Kingdom': 'GBR'
};*/
// for test
var countries = {
'Albania': 'ALB'
};
// create variables to create json object
var jsons = [];
var json = {vaccine: "", country: "", year: "", coverage: ""};
for(country in countries) {
var url = 'http://apps.who.int/immunization_monitoring/globalsummary/coverages?c=' + countries[country];
request(url, (function(country) {
var thisCountry = country;
return function(error, res, html) {
if(error) {
console.log(error);
throw error;
}
// send html response to cheerio to create DOM
$ = cheerio.load(html);
// arrays containing all the values
var years = [];
var vaccines = [];
var coverages = [];
var i = 1;
// scraping year values
$('.ts .year').each(function(year) {
var country = thisCountry.trim();
var year = $(this).text().trim();
years.push(year);
json.country = country;
json.year = year;
});
console.log(years, years.length);
// scraping vaccine values
$('.ts .odd td a').each(function(odd) {
var vaccine = $(this).text().trim();
vaccines.push(vaccine);
json.vaccine = vaccine;
});
$('.ts .even td a').each(function(even) {
var vaccine = $(this).text().trim();
vaccines.push(vaccine);
json.vaccine = vaccine;
});
console.log(vaccines, vaccines.length);
// scraping coverage values (get all values)
$('.ts .odd .statistics_small').each(function(oddCoverage) {
var coverage = $(this).text().trim();
coverages.push(coverage);
});
$('.ts .even .statistics_small').each(function(evenCoverage) {
var coverage = $(this).text().trim();
coverages.push(coverage);
});
console.log(coverages, coverages.length);
console.log("i", i); // 1
// scraping coverage values (geto only some values)
$('.ts .odd:nth-child(' + i + ')').each(function(oddCoverage) {
var coverage = $(this).text().trim();
json.coverage = coverage;
});
i++;
console.log("i", i); // 2
jsons.push(json);
// write jsons on file output.json
fs.writeFile('output.json', JSON.stringify(jsons, null, 3), function(error) {
console.log('File output.json successfully written!');
});
console.log("i", i); // 2
} // end return
})(country)); // end request
}
代码不起作用。
我能够检索所有年份、所有疫苗和所有覆盖率值。 但所有这些价值观都被混淆和混淆了。
我不确定如何构建一个包含所有有序和结构化值的 json 文件。我的目标是这样的文件:
[
{
"vaccine": "BCG",
"country": "Albania",
"year": 2016,
"coverage": 99
},
{
"vaccine": "BCG",
"country": "Albania",
"year": 2015,
"coverage": 100
},
{
"vaccine": "BCG",
"country": "Albania",
"year": 2014,
"coverage": 100
},
{
...
},
{
"vaccine": "BCG",
"country": "Albania",
"year": 1981,
"coverage": 93
},
{
"vaccine": "DTP1",
"country": "Albania",
"year": 2016,
"coverage": 99
},
{
...
},
{
"vaccine": "DTP1",
"country": "Albania",
"year": 1981,
"coverage": _
},
{
"vaccine": "TT2+",
"country": "Albania",
"year": 2016,
"coverage": _
},
{
...
},
{
"vaccine": "TT2+",
"country": "Albania",
"year": 1981,
"coverage": _
},
{
"vaccine": "BCG",
"country": "Austria",
"year": 2016,
"coverage": _
},
{
...
}
]
它必须包含576 = 36*16 元素。
我尝试创建一个索引i 来遍历td 元素,但它不起作用。
现在,我的output.js 文件是:
[
{
"vaccine": "TT2+",
"country": "Albania",
"year": "1981",
"coverage": ""
}
]
谢谢!
编辑
我的目标是从表中构造一个 json 对象。
桌子:
<table class="ts">
<tr>
<td class="statheadings" colspan="100%" align="center">
<h1 class="statistics">Coverage time series for Italy (ITA)</h1>
</td>
</tr>
<tr>
<td align="center" colspan="100%"> <font color="red">
Last updated 06-Sep-2017 (data as of 05-Sep-2017)<br />Next overall update 2018<br /></font>
</td>
</tr>
<tr>
<td colspan="100%" >
<hr />
</td>
</tr>
<tr>
<th colspan="4" align="left">Vacciness</th>
<th class="year">2016</th>
<th class="year">2015</th>
<th class="year">2014</th>
<th class="year">2013</th>
<th class="year">2012</th>
<th class="year">2011</th>
<th class="year">2010</th>
<th class="year">2009</th>
<th class="year">2008</th>
<th class="year">2007</th>
<th class="year">2006</th>
<th class="year">2005</th>
<th class="year">2004</th>
<th class="year">2003</th>
<th class="year">2002</th>
<th class="year">2001</th>
<th class="year">2000</th>
<th class="year">1999</th>
<th class="year">1998</th>
<th class="year">1997</th>
<th class="year">1996</th>
<th class="year">1995</th>
<th class="year">1994</th>
<th class="year">1993</th>
<th class="year">1992</th>
<th class="year">1991</th>
<th class="year">1990</th>
<th class="year">1989</th>
<th class="year">1988</th>
<th class="year">1987</th>
<th class="year">1986</th>
<th class="year">1985</th>
<th class="year">1984</th>
<th class="year">1983</th>
<th class="year">1982</th>
<th class="year">1981</th>
</tr>
<tr class="odd">
<td colspan="4" align="left">
<a href="timeseries/tscoveragedtp3.html" title="Click for full global time series for DTP3">DTP3</a>
</td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">93 </td>
<td class="statistics_small" colspan="1">95 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">97 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">97 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">95 </td>
<td class="statistics_small" colspan="1">94 </td>
<td class="statistics_small" colspan="1">96 </td>
<td class="statistics_small" colspan="1">93 </td>
<td class="statistics_small" colspan="1">93 </td>
<td class="statistics_small" colspan="1">87 </td>
<td class="statistics_small" colspan="1">87 </td>
<td class="statistics_small" colspan="1">86 </td>
<td class="statistics_small" colspan="1">85 </td>
<td class="statistics_small" colspan="1">84 </td>
<td class="statistics_small" colspan="1">84 </td>
<td class="statistics_small" colspan="1">83 </td>
<td class="statistics_small" colspan="1">82 </td>
<td class="statistics_small" colspan="1">80 </td>
<td class="statistics_small" colspan="1">80 </td>
<td class="statistics_small" colspan="1">80 </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<td class="statistics_small" colspan="1">_ </td>
<tr class="even">
<td colspan="4" align="left">
<a href="timeseries/tscoveragehepb3.html" title="Click for full global time series for HepB3">HepB3</a>
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
93
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
97
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
96
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
94
</td>
<td class="statistics_small" colspan="1">
94
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
95
</td>
<td class="statistics_small" colspan="1">
50
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<td class="statistics_small" colspan="1">
_
</td>
<tr class="odd">
<td colspan="4" align="left">
<a href="timeseries/tscoveragedtp3.html" title="Click for full global time series for DTP3">DTP3</a>
</td>
...
我想要的json对象:
[
{
"vaccine": "BCG",
"country": "Albania",
"year": 2016,
"coverage": 99
},
{
"vaccine": "BCG",
"country": "Albania",
"year": 2015,
"coverage": 100
},
{
"vaccine": "BCG",
"country": "Albania",
"year": 2014,
"coverage": 100
},
{
...
},
{
"vaccine": "BCG",
"country": "Albania",
"year": 1981,
"coverage": 93
},
{
"vaccine": "DTP1",
"country": "Albania",
"year": 2016,
"coverage": 99
},
{
...
},
{
"vaccine": "DTP1",
"country": "Albania",
"year": 1981,
"coverage": _
},
{
"vaccine": "TT2+",
"country": "Albania",
"year": 2016,
"coverage": _
},
{
...
},
{
"vaccine": "TT2+",
"country": "Albania",
"year": 1981,
"coverage": _
},
{
"vaccine": "BCG",
"country": "Austria",
"year": 2016,
"coverage": _
},
{
...
}
]
【问题讨论】:
-
试试这样:
$('.ts .odd .statistics_small,.ts .even .statistics_small') -
这很好,但问题仍然存在。结果 json 总是一样的...
-
抱歉,不清楚你的问题是什么,但我想我会整理出我理解的部分。
-
我修改了主信息,希望现在我的问题很清楚,我想要什么。
标签: json node.js web-scraping cheerio