【发布时间】:2020-08-19 14:20:37
【问题描述】:
我目前正在使用以下 Puppeteer AWS Lambda 层来抓取 30 个 URL,并在 S3 中创建和保存屏幕截图。目前,我发送了 30 个单独的有效负载,因此运行了 30 个 AWS Lambda 函数。 https://github.com/shelfio/chrome-aws-lambda-layer
每个 JSON 负载都包含一个 URL 和一个图像文件名,它们每 2-3 秒通过 POST 请求发送到 API Gateway。列表中的前 6 或 9 个 Lambda 函数似乎运行良好,然后它们开始以 Navigation failed because browser has disconnected! 失败,正如 AWS Cloudwatch 中所报告的那样。
所以我正在寻找一种替代解决方案,我如何编辑下面的代码,通过处理单个 JSON 有效负载数组来批量截取一组 30 个 URL? (例如,for循环等)
这是我当前用于生成单个 AWS Lambda 屏幕截图并发送到 S3 的代码:
// src/capture.js
// this module will be provided by the layer
const chromeLambda = require("chrome-aws-lambda");
// aws-sdk is always preinstalled in AWS Lambda in all Node.js runtimes
const S3Client = require("aws-sdk/clients/s3");
process.setMaxListeners(0) // <== Important line - Fix MaxListerners Error
// create an S3 client
const s3 = new S3Client({ region: process.env.S3_REGION });
// default browser viewport size
const defaultViewport = {
width: 1920,
height: 1080
};
// here starts our function!
exports.handler = async event => {
// launch a headless browser
const browser = await chromeLambda.puppeteer.launch({
args: chromeLambda.args,
executablePath: await chromeLambda.executablePath,
defaultViewport
});
console.log("Event URL string is ", event.url)
const url = event.url;
const domain = (new URL(url)).hostname.replace('www.', '');
// open a new tab
const page = await browser.newPage();
// navigate to the page
await page.goto(event.url);
// take a screenshot
const buffer = await page.screenshot()
// upload the image using the current timestamp as filename
const result = await s3
.upload({
Bucket: process.env.S3_BUCKET,
Key: domain + `.png`,
Body: buffer,
ContentType: "image/png",
ACL: "public-read"
})
.promise();
// return the uploaded image url
return { url: result.Location };
};
当前单个 JSON 负载
{"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/gavurin.com.png","url":"https://gavurin.com"}
【问题讨论】:
标签: javascript node.js amazon-web-services aws-lambda chromium