ts-textrank

ts-textrank is a Typescript implementation of the TextRank algorithm.

Install

Using npm:

$ npm install ts-textrank

Using yarn:

$ yarn add ts-textrank

Usage

Create a config object
Create a summarizer with your config
Call summarizer.summarize to extract most relevant senteces from an input text

import { SorensenDiceSimilarity, DefaultTextParser, ConsoleLogger, RelativeSummarizerConfig, Summarizer, NullLogger, Sentence } from "ts-textrank";

//Only one similarity function implemented at this moment.
//More could come in future versions.
const sim = new SorensenDiceSimilarity()

//Only one text parser available a this moment
const parser = new DefaultTextParser()

//Do you want logging?
const logger = new ConsoleLogger()

//You can implement LoggerInterface for different behavior,
//or if you don't want logging, use this:
//const logger = new NullLogger()

//Set the summary length as a percentage of full text length
const ratio = .25 

//Damping factor. See "How it works" for more info.
const d = .85

//How do you want summary sentences to be sorted?
//Get sentences in the order that they appear in text:
const sorting = SORT_BY.OCCURRENCE
//Or sort them by relevance:
//const sorting = SORT_BY.SCORE
const config = new RelativeSummarizerConfig(ratio, sim, parser, d, sorting)

//Or, if you want a fixed number of sentences:
//const number = 5
//const config = new AbsoluteSummarizerConfig(number, sim, parser, d, sorting)    

const summarizer = new Summarizer(config, logger)

//Language is used for stopword removal.
//See https://github.com/fergiemcdowall/stopword for supported languages
const lang = "en"

const text = "...Text to summarize..."
//summary will be an array of sentences summarizing text
const summary = summarizer.summarize(text, lang)

How it works

TextRank algorithm was introduced by Rada Mihalcea and Paul Tarau in their paper "TextRank: Bringing Order into Texts" in 2004. It applies the same principle that Google's PageRank used to discover relevant web pages.

The idea is to split a text into sentences, and then calculate a score for each sentence in terms of its similarity to the other sentences. TextRank treats sentences having common words as a link between them (like hyperlinks between web pages). Then, it applies a weight to that link based on how many words the sentences have in common. ts-textrank uses Sorensen-Dice Similarity for this.

The sentences with the higher score will be those that share the most words with the rest and can be used as a summary of the whole text.

Damping factor

Original PageRank algorithm included a damping factor to represent the probability of a user clicking random links on a page. In this context, the authors have kept it and fixed it to a value of .85, but it can be modified if needed for better results in specific cases.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
contrib		contrib
src		src
tests/unit/TextRank		tests/unit/TextRank
.gitignore		.gitignore
.npmignore		.npmignore
.prettierrc.json		.prettierrc.json
README.md		README.md
jest.config.ts		jest.config.ts
package.json		package.json
publish.sh		publish.sh
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ts-textrank

Install

Usage

How it works

Damping factor

About

Releases

Packages

Contributors 2

Languages

NachoBrito/ts-textrank

Folders and files

Latest commit

History

Repository files navigation

ts-textrank

Install

Usage

How it works

Damping factor

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages