Updates for requirements.txt and special characters #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains two commits.
The first commit updates
requirements.txt
to what's necessary to get a clean run withscraper.py
. A minor update to.gitignore
for venv's (hope that's okay).The second commit refines some special chars handling:
\u0435
is a Cyrillic small letter "е" (U+0435).Example:
"I park\u0435d my car right between the Methodist"
lyrics.json
currently has 410 of these.\u200b
is a zero-width space and it's weirdly hanging out in two song titles:Two instances in lyrics.json
"l\u200bong story short"
"r\u200bight where you left me"
I also added a couple of bits of code here and there for the sake of consistency and extra safeguarding.
Lastly, I did not include the resulting output data files because I saw that the diff was rather large and I got some interesting messages to the console that changed on every run (probably due to timeouts from Genius?). I'm also not sure if you run any post-processing or manual sanity-checking on that, but I presume that if you merge my PR, you can easily run it yourself to produce the new files.