Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL parsing compatibility #626

Open
Jacey0 opened this issue Mar 21, 2024 · 1 comment
Open

URL parsing compatibility #626

Jacey0 opened this issue Mar 21, 2024 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@Jacey0
Copy link

Jacey0 commented Mar 21, 2024

Hi,

I'm having trouble setting up the environment for this. I'm using a conda environment on Windows and get the same problem with python 3.9, 3.10 and 3.11. I also made sure to pip install with the requirements.txt here before running pip install newspaper4k.

I will encounter this first issue

File "c:\Users...\scrape_from_urls.py", line 1, in
import newspaper
File "C:\Users...\site-packages\newspaper_init_.py", line 17, in
from .api import (
File "C:\Users...\site-packages\newspaper\api.py", line 11, in
from newspaper.article import Article
File "C:\Users...\site-packages\newspaper\article.py", line 28, in
from .extractors import ContentExtractor
File "C:\Users...\site-packages\newspaper\extractors_init_.py", line 8, in
from newspaper.extractors.content_extractor import ContentExtractor
File "C:\Users...\site-packages\newspaper\extractors\content_extractor.py", line 8, in
from newspaper.extractors.articlebody_extractor import ArticleBodyExtractor
File "C:\Users...\site-packages\newspaper\extractors\articlebody_extractor.py", line 8, in
import newspaper.extractors.defines as defines
File "C:\Users...\site-packages\newspaper\extractors\defines.py", line 2, in
from typing_extensions import TypedDict, NotRequired
ModuleNotFoundError: No module named 'typing_extensions'

No biggie, just need to pip install typing-extensions, so the import works, but then it encounters another error later when I try to call newspaper.article with any url.

File "c:\Users...\scrape_from_urls.py", line 7, in
article = newspaper.article(url)
File "C:\Users...\site-packages\newspaper_init_.py", line 61, in article
a = Article(url, language=language, **kwargs)
File "C:\Users...\site-packages\newspaper\article.py", line 195, in init
scheme = urls.get_scheme(url)
File "C:\Users...\site-packages\newspaper\urls.py", line 370, in get_scheme
return urlparse(abs_url, **kwargs).scheme
File "c:\Users...\lib\urllib\parse.py", line 399, in urlparse
url, scheme, _coerce_result = _coerce_args(url, scheme)
File "c:\Users...\lib\urllib\parse.py", line 136, in _coerce_args
return _decode_args(args) + (_encode_result,)
File "c:\Users...\lib\urllib\parse.py", line 120, in _decode_args
return tuple(x.decode(encoding, errors) if x else '' for x in args)
File "c:\Users...\lib\urllib\parse.py", line 120, in
return tuple(x.decode(encoding, errors) if x else '' for x in args)
AttributeError: 'builtin_function_or_method' object has no attribute 'decode'

I also tried newspaper3k and get a similar AttributeError so I'm wondering if I should be using a different urllib version (urllib3==1.26.18).

Would be great if these could be added to the requirements.txt. Thank you.

@Jacey0 Jacey0 added the help wanted Extra attention is needed label Mar 21, 2024
@changchiyou
Copy link
Contributor

I encountered the error ModuleNotFoundError: No module named 'typing_extensions' while using M1 / Miniconda 3.10. However, I was able to resolve it by executing pip install typing_extensions. Following this, I did not encounter the error AttributeError: 'builtin_function_or_method' object has no attribute 'decode'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants