Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Crawler and Ingestion Service #15

Open
madeedwin-itb opened this issue Aug 14, 2014 · 1 comment
Open

Running Crawler and Ingestion Service #15

madeedwin-itb opened this issue Aug 14, 2014 · 1 comment

Comments

@madeedwin-itb
Copy link

In .pdf, is there explanation about how to set and running the crawler and ingestion service?

Is it possible for the crawler and ingestion service to act automatically when a link is submitted?

At project, there is folder crawler, and when i build the citeseerx, in dist folder there is some service there. What am i suppose to do with it?

@fanchyna
Copy link
Contributor

fanchyna commented Dec 3, 2014

Sorry for late reply.
CiteSeer is ingesting using a batch mode, but that's not written into the public document yet. We will add this part soon. We are also in the process of developing a seamless crawl-extraction-ingestion pipeline. The crawler directory contains the deprecated crawler code and the CDI (crawl document importer). They shouldn't generate anything to the dist/ folder. The code_base/ is partially used to setup the crawl website, and crawl API; the CDI is used for importing documents to the crawl database and repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants