GitHub - CGavrila/yan-crawler: Yet another Node crawler.

Overview

Simple module which allows you to poll websites at regular intervals and extract whatever information you want from the response. Strictly speaking, it's not a crawler. If you are looking for one, there are some quite popular alternatives out there like node-crawler.

Installation

npm install yan-crawler

Usage

var Crawler = require('yan-crawler').Crawler;
var crawler = Crawler.getInstance();

var amazonTemplate = {
    name: 'Amazon',
    url: 'https://www.amazon.com/',
    interval: 3000,
    callback: function(body, $) {
        // $ is cheerio - https://github.com/cheeriojs/cheerio
        console.log("Grabbed Amazon.");
    }
};

var IMDBTemplate = {
    name: 'IMDB',
    interval: 2000,
    url: 'http://www.imdb.com',
    callback: function(body, $) {
        console.log('Grabbed IMDB.');
    }
};

crawler.addEntry(amazonTemplate);
crawler.addEntry(IMDBTemplate);
crawler.start();

The code above will make requests to www.amazon.com every 3000ms and to www.imdb.com every 2000ms, calling their respective callbacks when it gets the results.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
dist		dist
src		src
test		test
typings		typings
.gitignore		.gitignore
Gruntfile.js		Gruntfile.js
README.md		README.md
index.js		index.js
package.json		package.json
tsconfig.json		tsconfig.json
typings.json		typings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Installation

Usage

License

About

Releases

Packages

Languages

CGavrila/yan-crawler

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation

Usage

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages