You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the crawl web site, the crawl history page shows the proportion of documents "In System", "Crawled" and "Fail to Convert", but the "In System" documents just means documents are extracted, but not necessarily mean they are ingested, i.e., documents may in the waiting list. And because of the significant speed difference between ingestion and extraction, the waiting list can be long. Therefore, we need the fourth parameter reflecting the real number of papers ingested. This can be done in three steps
(1) add a new flag in the "state" field in citeseerx_crawl.main_crawl_document table to indicate ingested papers;
(2) update view.py, adding "ingested_count" and calculate it in some way (either dynamically from the production database, or from the crawling, or from a database dump);
(3) update template, adding "ingested_count" in the displayed graph.
The text was updated successfully, but these errors were encountered:
In the crawl web site, the crawl history page shows the proportion of documents "In System", "Crawled" and "Fail to Convert", but the "In System" documents just means documents are extracted, but not necessarily mean they are ingested, i.e., documents may in the waiting list. And because of the significant speed difference between ingestion and extraction, the waiting list can be long. Therefore, we need the fourth parameter reflecting the real number of papers ingested. This can be done in three steps
(1) add a new flag in the "state" field in citeseerx_crawl.main_crawl_document table to indicate ingested papers;
(2) update view.py, adding "ingested_count" and calculate it in some way (either dynamically from the production database, or from the crawling, or from a database dump);
(3) update template, adding "ingested_count" in the displayed graph.
The text was updated successfully, but these errors were encountered: