【发布时间】:2022-01-23 05:20:06
【问题描述】:
我是 JavaScript 新手,所以我现在没有很好的编程技能,所以我一直在研究一个 Web Scraper,它返回一个名称、帖子、生物等数组,如下所示:
let infoOfPost = await newTab(browser, page);
所以 infoOfPost 是一个数组,它的值正在变化,因为它在循环中被调用,我可以在控制台中看到它,每次它都有新的 bio、posts、followers 值等。 但是,当我将此值推送到对象时,对象仅在循环第一次运行时存储初始值,并且在每次下一次迭代中它只是保持显示相同的值并且不会覆盖先前的值我将数组存储在 objec 中:
accountInfoObject.displayName =infoOfPost[0];
accountInfoObject.posts = infoOfPost[1];
accountInfoObject.followers=infoOfPost[2];
accountInfoObject.following =infoOfPost[3];
accountInfoObject.fullName = infoOfPost[4];
accountInfoObject.about =infoOfPost[5];
accountsInformation.push(accountInfoObject);
await objectsCsv(accountsInformation);
我现在看到的是这样的:
[
{
accountUrl: 'https://www.example.com/xyz.hij/',
displayName: 'saharpalmer',
posts: '368',
followers: '2,640',
following: '510',
fullName: 'Sahar Intuitive Life Mentor',
about: '30-year Experience: I help you shift your mindset????Get back on track quickly ????Fulfil your purpose & live your best life????'
}
]
我想看到的是,我的所有其他条目都在它后面加上逗号,并使其成为对象数组而不是单个对象数组。 目前我只看到一次,并且单个对象的这个数组不断重复。此外,我将此对象推送到一个数组并将其写入 Csv 文件,该文件也包含此对象,一次又一次重复,如下所示:
about accountUrl displayName posts followers following fullName
30-year Experience: I help you shift your mindset??Get back on track quickly??Fulfil your purpose & live your best life??' https://www.example.com/being.darsh/ saharpalmer 368 2640 510 Sahar
30-year Experience: I help you shift your mindset??Get back on track quickly??Fulfil your purpose & live your best life??' https://www.example.com/being.darsh/ saharpalmer 368 2640 510 Sahar
Object 和 Array 声明如下:
let accountsInformation = [];
let accountInfoObject = new Object();
完整代码是: 我们获取 Array 的文件是:
let accountsInformation = [];
let accountInfoObject = new Object();
async function scrapingPosts(browser, page) {
readCsvFile(urlsToVisit);
for (let x = 0; x < urlsToVisit.length; x++) {
secondaryUrl = urlsToVisit[x];
await page.waitFor(10000);
await page
.goto(`${secondaryUrl}`, {
waitUntil: "domcontentloaded",
})
.catch((e) => {});
await page.waitForSelector("article >div.EZdmt:nth-child(2)",
5000);
for (let i = 1; i < 5; i++) {
await page.waitFor(5000);
// this loops goes through all 3 posts of each container;
for (let j = 1; j <= 3; j++) {
// opening the modal means clicking on post i and j will
increment and we will keep moving to next post 1 by 1
await page.click(
`div.EZdmt > div > div > div:nth-child(${i}) > div:nth-child(${j})`);
let url = await urlOfIds(page, urlsAddress);
await page.waitFor(5000);
let infoOfPost = await newTab(browser, page);
accountInfoObject.accountUrl = url;
accountInfoObject.displayName = infoOfPost[0];
accountInfoObject.posts = infoOfPost[1];
accountInfoObject.followers = infoOfPost[2];
accountInfoObject.following = infoOfPost[3];
accountInfoObject.fullName = infoOfPost[4];
accountInfoObject.about = infoOfPost[5];
await page.waitFor(10000);
accountsInformation.push(accountInfoObject);
console.log(accountsInformation);
await objectsCsv(accountsInformation);
// Modal Closes here process repeats till the loop condition is unsatisfied
await page.click(
"body > div._2dDPU.QPGbb.CkGkG > div.qF0y9._4EzTm.BI4qX.qJPeX.fm1AK.TxciK.yiMZG >button.wpO6b");
await page.waitFor(20000);
}
}
}
await browser.close();
}
infoOfPosts 来自的文件是:
let evalSelector;
const selectorData = [];
async function newTab(browser, page) {
await page.keyboard.down("Control");
await page.click("span.Jv7Aj.mArmR.MqpiF");
await page.keyboard.up("Control");
await page.waitForTimeout(1000);
const newPage = (await browser.pages())[1];
await newPage.waitForNavigation("#react-root");
await newPage.waitFor(20000);
evalSelector = await selectorEvaluation(newPage, titleSelector);
selectorData.push(evalSelector);
evalSelector = await selectorEvaluation(newPage, noPostSelector);
selectorData.push(evalSelector);
evalSelector = await selectorEvaluation(newPage,
noOfFollowersSelector);
selectorData.push(evalSelector);
evalSelector = await selectorEvaluation(newPage,
noOfFollowingSelector);
selectorData.push(evalSelector);
evalSelector = await selectorEvaluation(newPage,
displayNameSelector);
selectorData.push(evalSelector);
evalSelector = await selectorEvaluation(newPage, aboutSelector);
selectorData.push(evalSelector);
console.log(selectorData);
await newPage.waitFor(5000);
await newPage.close();
return selectorData;
}
module.exports = newTab;
任何帮助都会非常感激。提前致谢。 赞一个!!
【问题讨论】:
-
每次推入数组时都需要创建一个新对象。您正在重用同一个对象。
-
感谢您联系@Barmar。如果我的数组中有数千个值,我该怎么做,所以我必须创建数千个对象??
-
当然。不然怎么会有成千上万种不同的值?
-
将
let accountInfoObject = new Object();行移动到保存对象的代码中,而不是在开始时只执行一次。 -
将
let accountInfoObject = {};放在let infoOfPost分配之后。在该循环的每次迭代中,您都需要一个全新的对象。
标签: javascript node.js arrays object puppeteer