Skip to content

evah/HealthyWebCrawler-T14

 
 

Repository files navigation

Healthy Crawler is a tool to gather all apps information from Xiaomi appstore. User could use our website to do searching according to filters.

Screenshots Screenshots

##Motivation By showing the apps' information in from the search box provided by our website, user can finally understand the status of app and get the better search result the app store provided.

##Usage Install lastest verison of Node.JS, ElasticSearch, MongoDB
npm install
Run elasticsearch and MongoDB supervisor ./bin/www
webpack --watch
Test page with url localhost:3000

##Components This project consists of two components:

A crawler to dig the app information(title, category, icon, rating, etc) from appstore.
A webpage including back-end and front-end to provide the search-box to let user can search apps according the filters.

##Crawler

crawler
 |__ spider
       |__ init__.py
       |__ toplist_spider.py
 |__ init__.py
 |__ items.py
 |__ pipelines.py
 |__ settings.py

toplist_spider.py defines how to crawl the webpages.
items.py defines the data fields which stored in MongoDB.
pipelines.py defines the store process in MongoDB

Run scrapy crawl toplist

##Search Platform We build a website to implement the searching. Node.JS for back-end and React.JS and Redux for front-end.

src
 |__ actions
 |__ components
 |__ containers
     |__ app_list.js
     |__ search_box.js
 |__ reducers

The structure of front-end follows the design rule of Redux.

At the back-end, I use elasticsearch to do the searching.

Apps.search({
  query: {
    bool: {
      should: [
            { match: { title: name}},
            { match: { category: cate}}
      ]
    }        
  }
},
function(err, results) {
  if (err) {
      console.log(err);
  }
  res.send(200, JSON.stringify(results.hits.hits));
});

##To Do List [ ] Token in Chinese searching in elasticsearch
[ ] Frond-end design
[ ] Optimize the scrapy in distrbuted systems and multiple IPs

##Team Members |Eva| |tsi| |Xing|

##Project Information

  • category: full stack
  • team: Healthy Web Crawler
  • description: A crawler can gather all information of apps from app store and a search platform to do searching.
  • stack: React, Node.js, MongoDB, ElasticSearch, Python

About

This is my team project to crawl xiaomi app store contents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published