【发布时间】:2020-05-13 15:22:57
【问题描述】:
我第一次尝试用cheerio 库从paginas amarillas 中进行一些网页抓取,例如公司名称、地址等...地址在一个没有类的跨度内,只有一个@ 987654321@,我尝试了不同的方法,因为到达该选择器的字符串很长,我获取数据直到 itemprop 之前的那个,但我不知道如何瞄准 itemprop 选择器,我遇到的问题是pathAddreses 常量它在控制台日志上返回一个空数组,如果我删除字符串的最后一个元素(itemprop='streetAddres'),它会给我带来数据,但不完全是我想要的
代码如下:
const cheerio = require("cheerio");
const request = require("request-promise");
//const of the classes of the paginas amarillas elements we are aiming to
const pathProfesionals1 =
".container .row .first-content-listado .col-lg-7.col-md-8.col-xs-12 .bloque-central .central .listado-item.item-ip .box .cabecera .row .col-xs-11.comercial-nombre a h2 span";
const pathProfesionals2 =
".container .row .first-content-listado .col-lg-7.col-md-8.col-xs-12 .bloque-central .central .listado-item.item-ig .box .cabecera .row .col-xs-11.comercial-nombre a h2 span";
const pathTelephones1 =
".container .row .first-content-listado .col-lg-7.col-md-8.col-xs-12 .bloque-central .central .listado-item.item-ip .box .pie-pastilla .row .col-xs-4 a.llama-desplegable.btn.btn-amarillo.btn-block.phone.hidden.d-none span";
const pathTelephones2 =
".container .row .first-content-listado .col-lg-7.col-md-8.col-xs-12 .bloque-central .central .listado-item.item-ig .box .pie-pastilla .row .col-xs-4 a.llama-desplegable.btn.btn-amarillo.btn-block.phone.hidden.d-none span";
const pathAddress1 = `.container .row .first-content-listado .col-lg-7.col-md-8.col-xs-12 .bloque-central .central .listado-item.item-ig .box .row a .location span *[itemprop = 'streetAddress']`;
const pathAddress2 = "";
init = async () => {
const arrCompanyName = [];
const arrTelephones = [];
const arrAddresses = [];
const { category, city } = this.state;
const $ = await request({
uri: `https://www.paginasamarillas.es/search/${category}/all-ma/${city}/all-is/malaga/all-ba/all-pu/all-nc/1?what=carpintero&where=malaga&ub=false&qc=true`,
transform: body => cheerio.load(body) //una vez hago la peticion lo paso a cheerio para que lo analice
});
const profesionals1 = $(pathProfesionals1).each((i, el) =>
arrCompanyName.push($(el).text())
);
const telephones1 = $(pathTelephones1).each((i, el) =>
arrTelephones.push($(el).text())
);
const profesionals2 = $(pathProfesionals2).each((i, el) =>
arrCompanyName.push($(el).text())
);
const telephones2 = $(pathTelephones2).each((i, el) =>
arrTelephones.push($(el).text())
);
const addresses1 = $(pathAddress1).each((i, el) =>
arrAddresses.push($(el).text())
);
console.log(arrAddresses);
}
【问题讨论】:
标签: javascript reactjs cheerio