Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Florida scraper #24

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Florida scraper #24

wants to merge 5 commits into from

Conversation

eabh
Copy link
Contributor

@eabh eabh commented Mar 22, 2014

Here is a scraper for Florida. I hope it helps.
Ed

@ajb
Copy link
Contributor

ajb commented Mar 24, 2014

Hey Ed, this is amazing, what a terrible site to parse, heh.

The scraper seems to be choking on two issues:

  1. It tries calling split, trim, etc. methods on the undefined object.
  2. When the --limit is not set, it calls .split(1,1), which returns an empty array.

You mind taking a look? :)

@ajb ajb added the wip label Mar 24, 2014
In case opts.limit is not provided, specify a default value 9999
@eabh
Copy link
Contributor Author

eabh commented Mar 24, 2014

Hey Adam Thanks for pointing out that --limit might not be supplied. I added a default value for --limit and tested it by passing in an empty object for opts. However I am not able to replicate the issues you described. I tried running with various values for --limit and also with no limit and it would not fail. What data did you use?

If opts.limit not supplied use 9999
@ajb
Copy link
Contributor

ajb commented Apr 5, 2014

Sorry for the delay, I'll try to be a bit more responsive so we can get this solved. Here's what's happening:

rocktop:openrfps-scrapers adamb$ bin/openrfps run scrapers/fl/rfps.coffee 

TypeError: Object 300 has no method 'trim'
  at Request._callback (/Users/adamb/repos/dobtco/openrfps-scrapers/scrapers/fl/rfps.coffee:133:9, <js>:128:58)
  at Request.self.callback (/Users/adamb/repos/dobtco/openrfps-scrapers/node_modules/request/request.js:121:22)
  at Request.EventEmitter.emit (events.js:98:17)
  at Request.<anonymous> (/Users/adamb/repos/dobtco/openrfps-scrapers/node_modules/request/request.js:978:14)
  at Request.EventEmitter.emit (events.js:117:20)
  at IncomingMessage.<anonymous> (/Users/adamb/repos/dobtco/openrfps-scrapers/node_modules/request/request.js:929:12)
  at IncomingMessage.EventEmitter.emit (events.js:117:20)
  at _stream_readable.js:920:16
  at process._tickCallback (node.js:415:13)

@eabh
Copy link
Contributor Author

eabh commented Apr 12, 2014

The page is essentially a mass of text and there are no obvious hard-and-fast rules as to when a particular element would be included or excluded. I still have not been able to find an example that makes it fail, but I recoded where you identified so if an element label is not found then it will not try to 'trim' the non-existent text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants