forked from martinsbalodis/web-scraper-chrome-extension
-
Notifications
You must be signed in to change notification settings - Fork 18
Distinct
FireAwayH edited this page Aug 21, 2018
·
4 revisions
The data we captured can be duplicate because of some reasons and we need a way to make our data unique.
This feature is set to true
by default
If you want to disable this feature to save the scraping time, please input anything except true in the Scrape page as the picture below.
Make sure the value is set to true and then start scraping
There are some situations which data duplication will happen:
- The crash of Chrome.
You can't figure out which URLs are not scrapped and the only way is restart the Sitemap
which stopped.
Data will be captured again when you restart a Sitemap
.
- The mistake in selectors or pages.
Sometimes scraper will get duplicate data because there are errors in your Selector
or just in the page you are scraping