Snatch It

Overview

Snatch-It grabs images from the Internet and saves them on a local drive

Notes

Used Puppeteer (Google Headless Crome Node API) to scrape a site
Used config module to define snatch-it settings (from 'where to save images' to 'get image from this selector' settings)
Goes through all pages till the 'next-page-selector' can be found on a page
Creates folders for a site and for each visited page to keep it easy to navigate
Used Books to Scrape. We love being scraped! as a default config (see it below)

Quick Start

Install dependencies

npm install

Run the app (with default settings)

npm start

Create a custom config on config folder (given an default.json)

{
  "browser": { 
    "headless": false
  },
  "paths": {
    "storage": "./data/",
    "mainFolder": "books/",
    "prefixChapterFolder": "page-"
  },
  "urls": {
    "start": "http://books.toscrape.com/"
  },
  "selectors": {
    "image": ".product_pod img",
    "nextPage": "ul.pager li.next a"
  },
  "extra": {
    "pagesLimit": 100
  }
}

then run the app

NODE_ENV=<your-config-name> npm start

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
data		data
.eslintrc		.eslintrc
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
scraper.js		scraper.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snatch It

Overview

Notes

Quick Start

About

Releases

Packages

Languages

mkora/snatch-it

Folders and files

Latest commit

History

Repository files navigation

Snatch It

Overview

Notes

Quick Start

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages