Skip to content

Commit

Permalink
codebase revamp & bug fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
santhoshse7en committed Nov 3, 2024
1 parent 005264a commit 66a2b0d
Show file tree
Hide file tree
Showing 13 changed files with 700 additions and 482 deletions.
83 changes: 34 additions & 49 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,75 +2,60 @@

## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.
We, as contributors and maintainers, pledge to foster an open and welcoming environment in our project and community.
We are committed to ensuring that participation is a harassment-free experience for everyone, regardless of age,
body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education,
socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:
We strive to create a positive environment through behaviors such as:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
* Using inclusive and welcoming language
* Respecting differing viewpoints and experiences
* Accepting constructive criticism graciously
* Prioritizing the community's best interests
* Showing empathy towards fellow community members

Examples of unacceptable behavior by participants include:
Unacceptable behaviors include:

* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
* Using sexualized language or imagery, and unwelcome sexual advances
* Trolling, derogatory comments, or personal/political attacks
* Harassment, whether public or private
* Sharing others' private information without explicit permission
* Any conduct that could be considered inappropriate in a professional context

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers are responsible for defining acceptable behavior standards and are expected to take fair and
appropriate action in response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
Maintainers have the authority to remove, edit, or reject comments, commits, code, wiki edits, issues, and
contributions that do not align with this Code of Conduct. They can also temporarily or permanently ban contributors
for behaviors deemed inappropriate, threatening, offensive, or harmful.

## Scope

This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
This Code of Conduct applies within project spaces and public spaces when individuals represent the project or
community. Representation includes using an official project email address, posting on official social media accounts,
or acting as appointed representatives at events. The definition of representation may be further clarified by
project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at [email protected]. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project
team at [email protected]. All complaints will be reviewed and investigated, resulting in an appropriate
response to the circumstances. The project team will maintain confidentiality regarding the identity of the reporter.
Specific enforcement policies may be outlined separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
Project maintainers who fail to follow or enforce the Code of Conduct in good faith may face temporary or permanent
consequences as determined by other members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html.

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq
For answers to common questions about this code of conduct, see https://www.contributor-covenant.org/faq.
29 changes: 14 additions & 15 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,19 @@ MIT License

Copyright (c) [2019] [M Santhosh Kumar]

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
documentation files (the "Software"), to deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit
persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of
the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF
CONTRACT, TORT, OR OTHERWISE, ARISING FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE SOFTWARE.


The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
103 changes: 52 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,15 @@

<img align="right" height="128px" width="128px" src="https://raw.githubusercontent.com/fhamborg/news-please/master/misc/logo/logo-256.png" />

news-fetch is an open-source, easy-to-use news crawler that extracts structured information from almost any news website. It can follow recursively internal hyperlinks and read RSS feeds to fetch both most recent and also old, archived articles. You only need to provide the root URL of the news website to crawl it completely. News-fetch combines the power of multiple state-of-the-art libraries and tools, such as [news-please](https://github.com/fhamborg/news-please) - [Felix Hamborg](https://www.linkedin.com/in/felixhamborg/) and [Newspaper3K](https://github.com/codelucas/newspaper/) - [Lucas (欧阳象) Ou-Yang](https://www.linkedin.com/in/lucasouyang/). This package consists of both features provided by Felix's work and Lucas' work.

I built this to reduce most of NaN or '' or [] or 'None' values while scraping for some news websites. Platform-independent and written in Python 3. Programmers and developers can very easily use this package to access the news data to their programs.
**news-fetch** is an open-source, easy-to-use news crawler that extracts structured information from almost any news website. It can recursively follow internal hyperlinks and read RSS feeds to fetch both recent and archived articles. You only need to provide the root URL of the news website to crawl it completely. News-fetch combines the power of multiple state-of-the-art libraries and tools, including [news-please](https://github.com/fhamborg/news-please) by [Felix Hamborg](https://www.linkedin.com/in/felixhamborg/) and [Newspaper3K](https://github.com/codelucas/newspaper/) by [Lucas (欧阳象) Ou-Yang](https://www.linkedin.com/in/lucasouyang/). This package leverages features from both of these works.

I built this tool to minimize NaN or empty values when scraping data from various news websites. It's platform-independent and written in Python 3, making it easy for programmers and developers to access news data for their applications.

| Source | Link |
| --- | --- |
| PyPI: | https://pypi.org/project/news-fetch/ |
| Repository: | https://santhoshse7en.github.io/news-fetch/ |
| Documentation: | https://santhoshse7en.github.io/news-fetch_doc/ (**Not Yet Created!**) |
| -------------- | ---------------------------------------------------------------------- |
| PyPI: | [https://pypi.org/project/news-fetch/](https://pypi.org/project/news-fetch/) |
| Repository: | [https://santhoshse7en.github.io/news-fetch/](https://santhoshse7en.github.io/news-fetch/) |
| Documentation: | [https://santhoshse7en.github.io/news-fetch_doc/](https://santhoshse7en.github.io/news-fetch_doc/) (**Not Yet Created!**) |

## Dependencies

Expand All @@ -27,69 +26,71 @@ I built this to reduce most of NaN or '' or [] or 'None' values while scraping f
- [chromedriver-binary](https://pypi.org/project/chromedriver-binary/)
- [pandas](https://pypi.org/project/pandas/)

## Extracted information
news-fetch extracts the following attributes from news articles. Also, have a look at an [examplary JSON file](https://github.com/santhoshse7en/news-fetch/blob/master/newsfetch/example/sample.json) extracted by news-please.
* headline
* name(s) of author(s)
* publication date
* publication
* category
* source_domain
* article
* summary
* keyword
* url
* language

## Dependencies Installation

Use the package manager [pip](https://pip.pypa.io/en/stable/) to install following
```bash
pip install -r requirements.txt
```
## Extracted Information

## Usage
news-fetch extracts the following attributes from news articles. You can also check out an [example JSON file](https://github.com/santhoshse7en/news-fetch/blob/master/newsfetch/example/sample.json) generated by news-please.

Download it by clicking the green download button here on [Github](https://github.com/santhoshse7en/news-fetch/archive/master.zip). To extract URLs from a targeted website, call the google_search function. You only need to parse the keyword and newspaper link argument.
- Headline
- Author(s)
- Publication date
- Publication
- Category
- Source domain
- Article content
- Summary
- Keywords
- URL
- Language

```python
>>> from newsfetch.google import google_search
>>> google = google_search('Alcoholics Anonymous', 'https://timesofindia.indiatimes.com/')
```
## Dependency Installation

Use the `URLs` attribute to get the links of all the news articles scraped.
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install the required dependencies:

```python
>>> google.urls
```bash
pip install -r requirements.txt
```

**Directory of google search results urls**

![google](https://user-images.githubusercontent.com/47944792/88402193-68a56d00-cde8-11ea-8f26-9f7bf19359b2.PNG)
## Usage

You can download it by clicking the green download button on [Github](https://github.com/santhoshse7en/news-fetch/archive/master.zip).

To scrape all the news details, call the newspaper function
To scrape all the news details, use the `newspaper` function:

```python
>>> from newsfetch.news import newspaper
>>> news = newspaper('https://www.bbc.co.uk/news/world-48810070')
```
from newsfetch.news import Newspaper

**Directory of news**
news = Newspaper(url='https://www.thehindu.com/news/cities/Madurai/aa-plays-a-pivotal-role-in-helping-people-escape-from-the-grip-of-alcoholism/article67716206.ece')
print(news.headline)
# Output: 'AA plays a pivotal role in helping people escape from the grip of alcoholism'
```

![newsdir](https://user-images.githubusercontent.com/47944792/60564817-c058dc80-9d7e-11e9-9b3e-d0b5a903d972.PNG)
To extract URLs from a targeted website, call the `GoogleSearchNewsURLExtractor` by providing the keyword and newspaper link as arguments:

```python
>>> news.headline

'g20 summit: trump and xi agree to restart us china trade talks'
from newsfetch.google import GoogleSearchNewsURLExtractor

google = GoogleSearchNewsURLExtractor(keyword='Alcoholics Anonymous', news_domain='https://timesofindia.indiatimes.com/')
print(google.urls)
"""
['https://timesofindia.indiatimes.com/city/pune/pune-takes-a-stand-against-alcoholism-experts-collaborate-with-alcoholics-anonymous/articleshow/114438466.cms',
'https://timesofindia.indiatimes.com/city/mumbai/we-have-lost-jobs-homes-alcoholics-anonymous/articleshow/96824383.cms',
'https://timesofindia.indiatimes.com/city/gurgaon/gurgaons-alcoholics-open-up-about-their-road-to-recovery/articleshow/45080744.cms',
'https://timesofindia.indiatimes.com/city/goa/alcoholism-is-illness-not-issue-of-weak-willpower-say-experts/articleshow/105320008.cms',
'https://timesofindia.indiatimes.com/city/bhopal/alcoholism-is-an-illness-bhopal-aa-silver-jubilee-celebration/articleshow/106849014.cms',
'https://timesofindia.indiatimes.com/city/ahmedabad/alcoholics-anonymous-switches-to-online-sessions/articleshow/76144639.cms',
'https://timesofindia.indiatimes.com/city/kochi/keralites-trying-to-kick-alcoholism-alcoholics-anonymous/articleshow/13977818.cms',
'https://timesofindia.indiatimes.com/city/chandigarh/alcoholics-anonymous-turned-their-lives-around/articleshow/18239.cms',
'https://timesofindia.indiatimes.com/city/mumbai/like-air-india-flyer-alcoholics-anonymous-members-reap-whirlwind-of-job-loss-broken-homes/articleshow/96820403.cms',
'https://timesofindia.indiatimes.com/city/nagpur/alcoholics-anonymous-meet-promotes-one-day-at-a-time/articleshow/50538092.cms']
"""
```

## Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.
Make sure to update tests as appropriate.

## License
[MIT](https://choosealicense.com/licenses/mit/)
This project is licensed under the [MIT](https://choosealicense.com/licenses/mit/) License.

1 change: 1 addition & 0 deletions newsfetch/example/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

Loading

0 comments on commit 66a2b0d

Please sign in to comment.