【发布时间】:2016-12-12 18:19:31
【问题描述】:
我正在尝试返回网站站点地图中提供的所有 URL,例如 Argos。获得这些 URL 后,我需要重复此过程以返回结果 URL 可能包含的任何 URL。例如:
http://www.argos.co.uk/sitemap.xml 返回:
http://www.argos. co.uk/product.xml
http://www.argos. co.uk/product2.xml
http://www.argos. co.uk/catalogue.xml
http://www.argos. co.uk/buyers_guides.xml
http://www.argos. co.uk/features_and_articles.xml
http://www.argos. co.uk/static_pages.xml
http://www.argos. co.uk/store_pages.xml
http://www.argos.co.uk/product.xml 然后包含我需要的它自己的链接(然后重复此过程,直到到达一个不包含更多可用 xml URL 的页面)
到目前为止:
var urls = require('sitemap-urls'); //package to return xml links from sitemap
var cheerio = require('cheerio');
var request = require('request')
// Returns all xml urls located within page source
request('http://www.argos.co.uk/sitemap.xml', function (error, response, html) {
var sitemap = html;
var results = urls.extractUrls(sitemap);
// If results returned, loop to make sitemap equal each url until array end
if(results) {
for(i = 0; i < results.length; i++) {
sitemap = results[i]
console.log(sitemap)
// Need to repeat url return process for each url returned
}
}
});
可能有一个我忽略的简单解决方案,任何帮助将不胜感激,谢谢。
【问题讨论】:
标签: javascript node.js xml request sitemap