A simple node.js scrapper that pulls out all links and images of a given site. 📦
npm install page-scrapper
- Super easy to use
- Removes duplicate links/images by default
- Filters out the relative paths (configurable)
- Tests cases added
const pageScrapper = require('page-scrapper');
(async() => {
const data = await pageScrapper('https://jsonplaceholder.typicode.com/');
console.log(data);
/* =>
{
links: [
'https://dev.to/typicode/what-s-new-in-husky-5-32g5',
'https://github.com/sponsors/typicode',
'https://blog.typicode.com',
'https://my-json-server.typicode.com',
'https://github.com/typicode/json-server',
'https://github.com/typicode/lowdb',
'https://tryretool.com/?utm_source=sponsor&utm_campaign=typicode',
'https://mockend.com',
'https://github.com/users/typicode/sponsorship',
'https://github.com/typicode'
],
images: [
'https://i.imgur.com/IBItATn.png',
'https://mockend.com/banner.svg'
]
}
*/
})();
There are the currently available options
Option | Required | Default | Description |
---|---|---|---|
absoluteOnly |
No | true |
Only scraps the absolute links. When set it to false it will fetch the relative paths too. |
For any new feature request or bug report, please open an issue or pull request.
- meta-fecther - Tiny URL meta-data fetcher(scrapper) for Node.js
MIT 2021 © Rocktim Saikia