【发布时间】:2021-07-30 05:37:33
【问题描述】:
所以我有一个网页的 HTML 摘录如下:
<li class="PaEvOc tv5olb wbTnP gws-horizon-textlists__li-ed">
//random div/element stuff inside here
</li>
<li class ="PaEvOc tv5olb gws-horizon-textlists__li-ed">
//random div/element stuff inside here as well
</li>
不确定如何正确复制 HTML,但如果您在 Google Chrome 上查看“location 附近的事件”,我正在查看这些并尝试从其中抓取数据:
https://i.stack.imgur.com/fv4a4.png
首先,我只是想弄清楚如何在 Puppeteer 中正确选择这些元素:
(async () => {
const browser = await puppeteer.launch({ args: [
'--no-sandbox'
]});
const page = await browser.newPage();
page.once('load', () => console.log('Page loaded!'));
await page.goto('https://www.google.com/search?q=events+near+poughkeepsie+today&client=safari&rls=en&uact=5&ibp=htl;events&rciv=evn&sa=X&fpstate=tldetail');
console.log('Hit wait for selector')
const test = await page.waitForSelector(".PaEvOc");
console.log('finished waiting for selector');
const seeMoreEventsButton = await page.$(".PaEvOc");
console.log('seeMoreEventsButton is ' + seeMoreEventsButton);
console.log('test is ' + test);
})();
这里到底有什么问题?非常感谢任何和所有帮助,谢谢!
【问题讨论】:
-
用 headless: false 运行它,这样你就可以看到发生了什么。
-
@pguardiario 在不支持无头的 Heroku 上运行它:false
-
所以你不是先在本地测试吗?
-
哦,我应该这样做,谢谢!
标签: javascript html node.js web-scraping puppeteer