Off to a good start writing a fairly complicated crawler #226
dogweather
started this conversation in
Show and tell
Replies: 1 comment 1 reply
-
Maybe it's a task for an item pipeline? E.g. if you could create a pipeline responsible for assuring the data types it would solve the problem. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Everyone,
The crawl target is a pretty dreadful gov't website made from awful dynamic JS and MS Word documents. :-)
I'm pretty excited to be giving Crawly a real try. I've got a lot of Scrapy code already, but I'm ok with switching if this Elixir alternative really pans out. Or, possibly use both systems for different tasks, since they're so similar from the programmer's POV.
It's been an easy transition, with Crawly using a lot of Scrapy conventions. I code my spiders a little unconventionally, using TDD with tests, a large parser module, and a small spider.
Now, in my Python Scrapy code, I rely on strict type checking to validate the output. E.g., glossary models for parsing online glossaries. I don't know yet how I'll do that in a natural Elixir way.
Beta Was this translation helpful? Give feedback.
All reactions