Running Crawler and Ingestion Service #15

madeedwin-itb · 2014-08-14T04:06:32Z

In .pdf, is there explanation about how to set and running the crawler and ingestion service?

Is it possible for the crawler and ingestion service to act automatically when a link is submitted?

At project, there is folder crawler, and when i build the citeseerx, in dist folder there is some service there. What am i suppose to do with it?

fanchyna · 2014-12-03T15:13:51Z

Sorry for late reply.
CiteSeer is ingesting using a batch mode, but that's not written into the public document yet. We will add this part soon. We are also in the process of developing a seamless crawl-extraction-ingestion pipeline. The crawler directory contains the deprecated crawler code and the CDI (crawl document importer). They shouldn't generate anything to the dist/ folder. The code_base/ is partially used to setup the crawl website, and crawl API; the CDI is used for importing documents to the crawl database and repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Crawler and Ingestion Service #15

Running Crawler and Ingestion Service #15

madeedwin-itb commented Aug 14, 2014

fanchyna commented Dec 3, 2014

Running Crawler and Ingestion Service #15

Running Crawler and Ingestion Service #15

Comments

madeedwin-itb commented Aug 14, 2014

fanchyna commented Dec 3, 2014