Skip to content
forked from miso-belica/sumy

Module for automatic text summarization of HTML documents.

License

Notifications You must be signed in to change notification settings

speedplane/sumy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic text summarizer

https://api.travis-ci.org/miso-belica/sumy.png?branch=master

Here are some other summarizers:

Installation

Currently only from git repo (make sure you have Python installed)

$ wget https://github.com/miso-belica/sumy/archive/master.zip # download the sources
$ unzip master.zip # extract the downloaded file
$ cd sumy-master/
$ [sudo] python setup.py install # install the package

Or simply run:

$ [sudo] pip install git+git://github.com/miso-belica/sumy.git

Usage

Sumy contains command line utility for quick summarization of documents.

$ sumy luhn --url=http://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
$ sumy edmundson --length=3% --url=http://cs.wikipedia.org/wiki/Bitva_u_Lipan
$ sumy --help # for more info

Various evaluation methods for some summarization method can be executed by commands below:

$ sumy_eval lsa reference_summary.txt --url=http://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
$ sumy_eval edmundson reference_summary.txt --url=http://cs.wikipedia.org/wiki/Bitva_u_Lipan
$ sumy_eval --help # for more info

Python API

Or you can use sumy like a library in your project.

# -*- coding: utf8 -*-

from __future__ import absolute_import
from __future__ import division, print_function, unicode_literals

from sumy.parsers.html import HtmlParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
from sumy.nlp.stemmers.czech import stem_word
from sumy.utils import get_stop_words


if __name__ == "__main__":
    url = "http://www.zsstritezuct.estranky.cz/clanky/predmety/cteni/jak-naucit-dite-spravne-cist.html"
    parser = HtmlParser.from_url(url, Tokenizer("czech"))

    summarizer = LsaSummarizer(stem_word)
    summarizer.stop_words = get_stop_words("czech")

    for sentence in summarizer(parser.document, 20):
        print(sentence)

Tests

Run tests via

$ nosetests-2.6 && nosetests-3.2 && nosetests-2.7 && nosetests-3.3

About

Module for automatic text summarization of HTML documents.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%