Java Implementation for a web crawler

This project contains library and a sample database (crawler.db) to connect to and store the records. Clone the project and in the UrlCrawler.java, edit the main method to include the URL you wish to crawl and the word to be searched for.

You can have a dependency manager (like Maven etc) for managing the project, but this is a simple basic implementation for demonstration purposes.

Requirements:

1: SQLite3

How to run the project:

1: Make sure SQLite3 is installed on your computer. Create the table visited by opening the terminal and typing the following on the cli:

sqlite3 /Users/WebCrawler/src/crawler/crawler.db
sqlite3> CREATE TABLE IF NOT EXISTS visited (
           RecordId INTEGER PRIMARY KEY NOT NULL,
           URL text NOT NULL
         );

This will create the empty schema for you.

2: Edit the main() method in UrlCrawler.java and pass in Url you want to crawl, along with the word to search. Also provide the baseUrl of the URL to be searched

3: Hit Run and see the System console for crawled links. You can also see those in the table you created by a simple select query in the Database.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.idea		.idea
lib		lib
src/crawler		src/crawler
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Java Implementation for a web crawler

Requirements:

How to run the project:

About

Releases

Packages

Languages

yugmamoradia/WebCrawler

Folders and files

Latest commit

History

Repository files navigation

Java Implementation for a web crawler

Requirements:

How to run the project:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages