Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal error: Maximum call stack size exceeded #31

Open
nvaken opened this issue Jan 14, 2016 · 5 comments
Open

Fatal error: Maximum call stack size exceeded #31

nvaken opened this issue Jan 14, 2016 · 5 comments

Comments

@nvaken
Copy link

nvaken commented Jan 14, 2016

Probably because the site that I'm crawling has an pretty high amount of resources. Though, I wonder if this isn't preventable? Am I overlooking an option here?

@jejernig
Copy link

bumping for same issue. Crawling a share point site with tons of links.

@infomongo
Copy link

Same issue for me. This happened when the site being tested added a large (9 MB) video. So I don't think it is the number of resources, for me, but the size

If there is no fix/workaround, I'm gonna have to stop using the link checker.

@infomongo
Copy link

The config options are here: https://github.com/cgiffard/node-simplecrawler#configuration
But none of them allow me to fix/workaround my issue

Seems like one of these should do it, but I can't et them to work
crawler.maxResourceSize=16777216 - The maximum resource size that will be downloaded, in bytes. Defaults to 16MB.
I tried maxResourceSize: or 2MB to 32MB and got no difference in behavior
Similarly downloadUnsupported: false has no affect

Doesn't seem to be a config option to ignore some file types, unless there is a way to use Fetch Conditions. Not clear this is possible.

Probably going to stop using this :(

@infomongo
Copy link

I was able to fix this by using fetch conditions to ignore the movie that was causing the problem. My grunt file (coffee script) looks like this:

linkChecker:
  build:
    site: 'localhost',
    options:
      initialPath: '/site-dir.html'
      maxConcurrency: 20
      initialPort: 8000
      supportedMimeTypes: [/text\/html/, /text\/css/]
      callback: (crawler)=>
        crawler.addFetchCondition((url)=>
          return !url.path.match(/\.mp4$/i)
        )

@nvaken
Copy link
Author

nvaken commented May 13, 2016

Not sure, as I can not check this as we speak, though I'm pretty sure my original error isn't caused by one big resource. I'm pretty sure the projects that I'm checking do not have bigger resources then say ~5 MB and that would even be a anomaly. Your fix seems to do the job for specific big resources (which is good to have in here! 👍 ), though, it will probably not fix my original issue.

So, that being said, I'm still looking for answers. 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants