-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modularize main script for scraping different years websites #16
Comments
Hello @josix I have some suggestion about this issue.
Pseudo code:
Pseudo structure:
About global variables like |
I'd like to pick up the issue if you're willing to assign it to me. Thanks! |
It looks GREAT! I would be appreciated if you could help on this. |
I guess we also need to reformat the code or add some linter to make the script conform PEP8 in the project. Introducing some reformat tools like |
Sure. I'll add those packages and resolve this issue with #22 . |
If you're interested in following the convention from mail-handler, maybe you can give https://github.com/Lee-W/cookiecutter-python-template a try. It comes with all the tools you mention. |
Thanks! I'll adopt some Coding style & testing packages from it. |
In addition to that, please try cruft instead of using cookie-cutter directly. It's a tool that can help us get updates from the template easier. |
I think we could leave this issue simpler just for handling the modularity of the codebase. I'll create another two issues including improving coding style by introducing linter/reformatter and adding more tests for checking the reliability of the code. |
+1 for this, this issue can be split into smaller issues. |
Refactor crawler function according to issue #16
The script
main.py
is used for scraping PyConTW 2016-2020 websites. Currently this script contains lots of global variables and shared functions which will increase the difficulty to keep flexible for scraping the official website in the future. It is hard to maintain/develop. It will be great if we could make this script more structural like separating different year parsing detail into different handler or module, keep the same crawling processing in a base class, etc. Any ideas about this enhancement are welcome.The text was updated successfully, but these errors were encountered: