Is this project still being maintained? #878

lodenrogue · 2021-03-23T02:35:04Z

No description provided.

johnbumgarner · 2021-03-23T12:57:54Z

The owner of this project hasn't responded to any inquires about the status of this project, since June 2020. The project likely needs to be forked and updated, because the last published update by @codelucas was on Jun 13, 2017.

ghost · 2021-03-31T01:55:00Z

The owner did an interview on a podcast in September where he expressed his interest in continuing to maintain the library but that he was having trouble keeping up with it (if my memory serves me).

The interview:
https://www.pythonpodcast.com/newspaper-data-extraction-episode-280/

planktonrobo · 2021-04-09T14:30:36Z

If this project were to be forked & updated, what suggestions do you have for updates needed? @johnbumgarner

ghost · 2021-04-09T17:54:51Z

Has anyone tried to reach out to the developer yet? I may reach out offering support.
The biggest thing this project needs in my opinion is more transparent and direct access to the cached articles. If there are methods to access the cache, I have not found them yet.

johnbumgarner · 2021-04-09T22:50:52Z

Yes. Reference: #813

Has anyone tried to reach out to the developer yet? I may reach out offering support.
The biggest thing this project needs in my opinion is more transparent and direct access to the cached articles. If there are methods to access the cache, I have not found them yet.

johnbumgarner · 2021-04-10T17:15:36Z

If this project were to be forked & updated, what suggestions do you have for updates needed? @johnbumgarner

Based on some the past issues the extraction piece of this module would require the most changes. After that likely the NLP piece of this code.

AlviseSembenico · 2021-04-23T10:27:27Z

Shall we once for all fork it and work on it? It seems a lot of time passed since the last conversation about this.

lodenrogue · 2021-04-24T14:26:49Z

Yes please

johnbumgarner · 2021-04-24T15:24:27Z

Shall we once for all fork it and work on it? It seems a lot of time passed since the last conversation about this.

@AlviseSembenico mostly likely, because the module's creator won't respond to emails about the status of the code base. The question is how much to keep and how much to redesigned from scratch. The rule-base extraction is still useful, but it might be better to rebuild that to use some type of machine learning technique that can "guess at a page's structure and tags." I have started doing research into that, but I'm not an expert on ML or modeling.

I have also been exploring all the issues with the current version by reading all the pull requests and open/closed issues.

RaedShabbir · 2021-05-05T05:49:16Z

@johnbumgarner Would love to contribute on that

AlviseSembenico · 2021-05-05T10:53:05Z

@johnbumgarner Your is a good point. Let's bear in mind that a "fast" version should be available since some of the use cases require speed and might run on not-so-performing computers. I have an ML background so can do research. Did you already look if there is already a project going in that direction?

RaedShabbir · 2021-05-05T15:32:29Z

@AlviseSembenico
The best I've found is https://github.com/fhamborg/news-please

They recently released a paper with a non transformer based model https://github.com/fhamborg/NewsMTSC

It would be great to see a version of that library empowered by huggingfaces!

AlviseSembenico · 2021-05-07T12:22:37Z

@RaedShabbir I worker with News-please, it is a great project, however, it uses Newspaper and other heuristics under the hood so it is not a radical change in the paradigm.

edvilme · 2021-05-22T20:54:33Z

Hello! I recently stumbled upon this repo.
Despite not being maintained anymore, how reliable would you say this project is? And is news-please any more reliable?
If not, has anyone made an updated fork?

mxdev88 · 2023-06-06T19:53:02Z

And is news-please any more reliable?

news-please depends on newspaper3k so it cannot be considered more reliable. news-please however is an active project. We are better off getting in touch with news-please maintainer. newspaper3k could potentially be made an optional dependency and replaced by another extractor.

palfrey mentioned this issue Jun 6, 2023

Project status #970

Open

This was referenced Oct 24, 2023

Is this project still being maintained? AndyTheFactory/newspaper4k#511

Closed

Project status AndyTheFactory/newspaper4k#578

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is this project still being maintained? #878

Is this project still being maintained? #878

lodenrogue commented Mar 23, 2021

johnbumgarner commented Mar 23, 2021 •

edited

Loading

ghost commented Mar 31, 2021

planktonrobo commented Apr 9, 2021

ghost commented Apr 9, 2021

johnbumgarner commented Apr 9, 2021

johnbumgarner commented Apr 10, 2021

AlviseSembenico commented Apr 23, 2021

lodenrogue commented Apr 24, 2021

johnbumgarner commented Apr 24, 2021

RaedShabbir commented May 5, 2021

AlviseSembenico commented May 5, 2021

RaedShabbir commented May 5, 2021

AlviseSembenico commented May 7, 2021

edvilme commented May 22, 2021

mxdev88 commented Jun 6, 2023 •

edited

Loading

Is this project still being maintained? #878

Is this project still being maintained? #878

Comments

lodenrogue commented Mar 23, 2021

johnbumgarner commented Mar 23, 2021 • edited Loading

ghost commented Mar 31, 2021

planktonrobo commented Apr 9, 2021

ghost commented Apr 9, 2021

johnbumgarner commented Apr 9, 2021

johnbumgarner commented Apr 10, 2021

AlviseSembenico commented Apr 23, 2021

lodenrogue commented Apr 24, 2021

johnbumgarner commented Apr 24, 2021

RaedShabbir commented May 5, 2021

AlviseSembenico commented May 5, 2021

RaedShabbir commented May 5, 2021

AlviseSembenico commented May 7, 2021

edvilme commented May 22, 2021

mxdev88 commented Jun 6, 2023 • edited Loading

johnbumgarner commented Mar 23, 2021 •

edited

Loading

mxdev88 commented Jun 6, 2023 •

edited

Loading