You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
build() seems not appear to be reliable. Sometimes it does what you would expect, sometimes not. Sometimes it returns tons of articule URLs, and some seconds later not, providing the same input parameter.
When I scrape cnn.com, or edition.cnn.com, which has been used in the official examples, it constatenly returns different result und subsequent calls.
On the first attempt, it returns 100s of article URLs. On the second call only 5, on the third only 2 and from then on, zero results.
Waiting a day, starting from scratch, repeats it in a similar way.
To Reproduce
Steps to reproduce the behavior, please post any code you used and the website you tried to parse/process:
cnn_paper = newspaper.Source('https://cnn.com')
print(cnn_paper.size()) # no articles, we have not built the source
cnn_paper.build()
print(cnn_paper.article_urls())
print(cnn_paper.size())
Expected behavior
it would always return a bunch of listed URLs on the cnn.com site.
Screenshots
System information
OS: [Windows / Linux / Macos]
Python version [e.g. 3.6, 3.9]
Library version [e.g. 0.9.0]
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
build()
seems not appear to be reliable. Sometimes it does what you would expect, sometimes not. Sometimes it returns tons of articule URLs, and some seconds later not, providing the same input parameter.When I scrape
cnn.com
, oredition.cnn.com
, which has been used in the official examples, it constatenly returns different result und subsequent calls.On the first attempt, it returns 100s of article URLs. On the second call only 5, on the third only 2 and from then on, zero results.
Waiting a day, starting from scratch, repeats it in a similar way.
To Reproduce
Steps to reproduce the behavior, please post any code you used and the website you tried to parse/process:
Expected behavior
it would always return a bunch of listed URLs on the cnn.com site.
Screenshots
System information
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: