-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this project still being maintained? #878
Comments
ref: #813 The owner of this project hasn't responded to any inquires about the status of this project, since June 2020. The project likely needs to be forked and updated, because the last published update by @codelucas was on Jun 13, 2017. |
The owner did an interview on a podcast in September where he expressed his interest in continuing to maintain the library but that he was having trouble keeping up with it (if my memory serves me). The interview: |
If this project were to be forked & updated, what suggestions do you have for updates needed? @johnbumgarner |
Has anyone tried to reach out to the developer yet? I may reach out offering support. |
Yes. Reference: #813
|
Based on some the past issues the extraction piece of this module would require the most changes. After that likely the NLP piece of this code. |
Shall we once for all fork it and work on it? It seems a lot of time passed since the last conversation about this. |
Yes please |
@AlviseSembenico mostly likely, because the module's creator won't respond to emails about the status of the code base. The question is how much to keep and how much to redesigned from scratch. The rule-base extraction is still useful, but it might be better to rebuild that to use some type of machine learning technique that can "guess at a page's structure and tags." I have started doing research into that, but I'm not an expert on ML or modeling. I have also been exploring all the issues with the current version by reading all the pull requests and open/closed issues. |
@johnbumgarner Would love to contribute on that |
@johnbumgarner Your is a good point. Let's bear in mind that a "fast" version should be available since some of the use cases require speed and might run on not-so-performing computers. I have an ML background so can do research. Did you already look if there is already a project going in that direction? |
@AlviseSembenico They recently released a paper with a non transformer based model https://github.com/fhamborg/NewsMTSC It would be great to see a version of that library empowered by huggingfaces! |
@RaedShabbir I worker with News-please, it is a great project, however, it uses Newspaper and other heuristics under the hood so it is not a radical change in the paradigm. |
Hello! I recently stumbled upon this repo. |
news-please depends on newspaper3k so it cannot be considered more reliable. news-please however is an active project. We are better off getting in touch with news-please maintainer. newspaper3k could potentially be made an optional dependency and replaced by another extractor. |
No description provided.
The text was updated successfully, but these errors were encountered: