Welcome to Extractus
We develop and share open source tools for collecting media content.
- feed-extractor: extract & normalize RSS/ATOM/JSON feed
- article-extractor: extract main article from given URL
- oembed-extractor: extract oEmbed data from supported providers
You can use one or combination of them to build news sites, create automated content systems for marketing campaign or gather dataset for NLP projects, etc.
Here is an example based on our news engine.
If you have any idea, or want more features, or face any problem while using them, please create issue.
In the future, we would like to add more dedicated tools for extracting links, tweets, audios, videos, products, crypto/stock prices.
We have not much time. This is self-training and non-profit side project. Contributions and collaborators are always welcomed 🙂