Skip to content
forked from aaaton/golem

A lemmatizer implemented in Go

Notifications You must be signed in to change notification settings

charlesgiroux/golem

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GoLem

This project is a dictionary based lemmatizer written in pure go, without external dependencies.

What?

A lemmatizer is a tool that finds the base form of words.

Lang Input Output
English aligning align
Swedish sprungit springa
French abattaient abattre

It's based on the dictionaries found on lexiconista.com, which are available under the Open Database License. This project would not be feasible without them.

Languages

At the moment I have added English, Swedish, French, Spanish & German, but adding another language should be no more trouble than getting the dictionary for that language. Some of which are already available on lexiconista. Please let me know if there is something you would like to see in here, or fork the project and create a pull request.

Basic usage

package main

import (
	"github.com/aaaton/golem"
)

func main() {
	// "en" and "english" will give an english lemmatizer
	lemmatizer, err := golem.New("english")
	if err != nil {
		panic(err)
	}
	word := lemmatizer.Lemma("Abducting")
	if word != "abduct" {
		panic("The output is not what is expected!")
	}
}

About

A lemmatizer implemented in Go

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 79.5%
  • Makefile 20.5%