You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been doing some tests in a situation where multiple crawlers are set each a with a Listener for a Crawl event. When the HttpCrawlerConfigs are added to the HttpCollector it duplicates the listeners therefore calling multiple times the logic of my program.
Simplified example:
HttpCollectorConfigconfig = newHttpCollectorConfig();
List<HttpCrawlerConfig> httpCrawlerConfigs = newArrayList<>();
for(inti = 0; i < urlsList.length; i++){
varhttpCrawlerConfig = newHttpCrawlerConfig();
httpCrawlerConfig.setEventListeners(newCrawlEventListener());
httpCrawlerConfigs.add(httpCrawlerConfig);
}
HttpCrawlerConfig[] crawlerConfigs = httpCrawlerConfigs.toArray(newHttpCrawlerConfig[httpCrawlerConfigs.size()]);
config.setCrawlerConfigs(crawlerConfigs);
varcollector = newHttpCollector(collectorConfig); // From the debugging I did seems it happens when it scans the crawlers collector.start(); // configs here, and duplicates the listeners in the event manager
I did a workaround to set the listeners only for the first HttpCrawlerConfig, but I think it should be possible to use separate listeners for each Crawler.
Regards,
Fabian
The text was updated successfully, but these errors were encountered:
Technically, the listeners are not duplicated but rather invoked for ALL events fired by the collector that is an instance of your listener "accept" method argument. That includes events from other crawlers.
It is by design as there might be legit cases for a crawler to want to know what is happening in another crawler for whatever reason. I understand it is not the most intuitive though.
Since it is possible to configure event listeners at both the collector-level and the crawler-level, it would make sense to imply an event hierarchy there and provide isolation from other crawlers when registered only for a specific crawler.
Since there are valid use cases for both approaches, I think we need to make it more flexible and offer an easy way to adjust the listening scope and maybe change the default behaviour to the most intuitive one.
Hello!
I have been doing some tests in a situation where multiple crawlers are set each a with a Listener for a Crawl event. When the HttpCrawlerConfigs are added to the HttpCollector it duplicates the listeners therefore calling multiple times the logic of my program.
Simplified example:
I did a workaround to set the listeners only for the first HttpCrawlerConfig, but I think it should be possible to use separate listeners for each Crawler.
Regards,
Fabian
The text was updated successfully, but these errors were encountered: