You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry for late reply.
CiteSeer is ingesting using a batch mode, but that's not written into the public document yet. We will add this part soon. We are also in the process of developing a seamless crawl-extraction-ingestion pipeline. The crawler directory contains the deprecated crawler code and the CDI (crawl document importer). They shouldn't generate anything to the dist/ folder. The code_base/ is partially used to setup the crawl website, and crawl API; the CDI is used for importing documents to the crawl database and repository.
In .pdf, is there explanation about how to set and running the crawler and ingestion service?
Is it possible for the crawler and ingestion service to act automatically when a link is submitted?
At project, there is folder crawler, and when i build the citeseerx, in dist folder there is some service there. What am i suppose to do with it?
The text was updated successfully, but these errors were encountered: