【发布时间】:2021-02-23 11:01:48
【问题描述】:
我想抓取一个网站的数据,所以我尝试使用cheerio npm 包
选择器在 chrome 开发工具中运行良好
let commodity_array = $(
"#tdm_base_scroll > div > div.dt_ta_09 > div.dt_ta_10"
)
.text()
.split("\n");
console.log(commodity_array);
但在我的代码中使用时,它返回空响应
我的代码:
const request = require("request-promise"),
cheerio = require("cheerio"),
fs = require("fs"),
json2csv = require("json2csv").Parser;
const url = "https://www.commodityonline.com/mandiprices/";
(async () => {
let mandiData = [];
const response = await request({
uri: url,
headers: {
accept:
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-IN,en-GB;q=0.9,en-US;q=0.8,en;q=0.7,la;q=0.6",
},
gzip: true,
});
let $ = cheerio.load(response);
let commodity_array = $(
"#tdm_base_scroll > div > div.dt_ta_09 > div.dt_ta_10"
)
.text()
.split("\n");
console.log(commodity_array);
})();
我从中抓取数据的网站网址是:https://www.commodityonline.com/mandiprices/
我从hitesh chaudhary youtube频道,this video了解了这种抓取方法
请求标头有问题吗,
我是网络抓取的新手,所以我不明白我做错了什么步骤
【问题讨论】:
标签: javascript node.js web-scraping cheerio