Crawler skip_uri_patterns is incomplete #35

rmacklin · 2013-10-24T20:45:36Z

I was using tarantula and noticed that it was trying to crawl a "tel:" link, ultimately failing because the path of the parsed URI was nil. I looked into it and saw that a simple fix would be to add tel to the skip_uri_patterns list in Crawler's initialize function. However, the crawler would have the same issue with other URI schemes that aren't listed in skip_uri_patterns, so it seems like a more general approach may be better. Do you think it would make more sense to skip URIs that start with any scheme name, or is there a reason you specifically chose to only skip the javascript, mailto, and http schemes?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawler skip_uri_patterns is incomplete #35

Crawler skip_uri_patterns is incomplete #35

rmacklin commented Oct 24, 2013

Crawler skip_uri_patterns is incomplete #35

Crawler skip_uri_patterns is incomplete #35

Comments

rmacklin commented Oct 24, 2013