Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puppeteer 爬虫性能优化 #14

Open
nfwyst opened this issue Apr 13, 2019 · 0 comments
Open

Puppeteer 爬虫性能优化 #14

nfwyst opened this issue Apr 13, 2019 · 0 comments

Comments

@nfwyst
Copy link
Owner

nfwyst commented Apr 13, 2019

我们在爬取网站的时候, 一般比较关心网站的加载速度, 而限制加载速度的大多数是静态文件, 包括 css, font, image. 为了优化爬虫性能, 我们需要阻止浏览器加载这些不必要的文件, 这可以通过对请求进行拦截来实现

优化静态文件加载

await page.setRequestInterception(true);
page.on('request',  req => {
  if(['image', 'stylesheet', 'font'].includes(req.resourceType())) {
    return request.abort();
  }
  return request.continue();
});

// other stuff

这样在页面发出请求的时候, 不用加载图片, 样式和字体, 可以大大提高爬虫的性能和速度

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant