etymolog

Martin Užák
12 April 2020

I find the meaning of words and the origin thereof fascinating. etymolog is a tool to store, retrieve and work with etymological facts. For example this could be looking up the Sanskrit word tr and derivation thereof (./details.py sa tr):

 => sa:tr = en:stars, to cross over
    + sa:rAtrI {rA+tr} = en:night, sa:naktA [that which gives (rA) the stars (tr)]
   -> pt:trazer   
   -> sa:str = en:star, strewn, scattered, spread  
     -> sa:tara = en:crossing, sa:Kali  
       -> en:Tartary   
     -> en:astral   
     -> en:star   
     -> en:transit

Or producing a dictionary for a language showing the derivations of the words therefrom (./dict.py sk):

* báť sa 
    -> sa:bhaya
* beda 
    -> sa:bheda
* biely 
    -> slavic:bel -> sa:bhalu
* brat 
    -> sa:bhrAtr
...

Pill File Grammar

Etymological data is stored in .pill files. They are processed line by line. One line contains one statement that can consist of several expressions. It is not allowed for one statement to span across multiple lines.

Whitespaces on the beginning and end of a line are stripped. Whitespaces between words are normalized to one space.

Words

A word is the basic unit within a pill. It consists of any number of given alphabetical characters. Example:

ship

Each word gets assigned to a language. The language is an abbreviation of the language name. The default language is English (en) and this can be changed in config. It is used, when no explicit language is set for a word. So the previous example is the same as:

en:ship

Composite words

You can have several words following each other and they will be fused into one:

big ship (out of wood)

creates one word big ship (out of wood) in the default language. Only the first word can have a language explicitly set:

en:big ship

To have two or more different language definitions for a composed word is invalid though:

en:big de:Schiff     // invalid

The words are stored case-sensitive just as you define them, but they are always looked up case-insensitive. So looking up Ship and ship will yield the same result.

Groups

You can have several logically different words in the same statements comprising a group separated by ,:

    boat, ship, steam trawler

Will create three words. Group make sense to be used in relationships

Relationships

The value of etymolog lies in the ability to define relationships among words. Especially the unidirectional Derive is useful as it helps to define the origin of words.

Derivation

Derivation (->) indicates that one word has given birth to another one:

en:pyre -> en:fire

You can have several relationships on one line:

sa:Pas -> sa:Pasa -> lat:pax -> pacify

Here we say that the word pacify is derived from Latin pax from Sanskrit word pasa which in turn stems from pas.

Relations are processed from the left and when one relationship (sa:Pas -> sa:Pasa) is used to create another one, the right-most part (sa:Pasa) from the one on left is used to create the new one (sa:Pasa -> lat:pax).

You can have any number of relationships on one line. You can also combine any relationships you like.

Equality

Equality defines what would be the translation of a word from one language to another:

en:ship = de:Schiff

Here is where the groups make sense. If a group is used on the right side of relationship, the relationship will be created for all of its members:

sa:Pas -> sa:Pasa = rope, cord, tie net, chain, trap, noose, snare

This first defines the derivation of sa:Pas into sa:Pasa. Then it creates seven equals relationships for sa:Pasa with translation into the default language.

If a group is used on the left side of a relationship, only the right-most part of it will be used. E.g.:

sa:dvipa -> hindi:Doab, ir:Dobar, cornish:Dofer, celtic:Dubron -> Dubrovnik

Will translate into:

sa:dvipa -> hindi:Doab
sa:dvipa -> ir:Dobar
sa:dvipa -> cornish:Dofer
sa:dvipa -> celtic:Dubron
celtic:Dubron -> en:Dubrovnik

Generic Relation

If there is a link between words yet it is not direct translation (Equals relationship) nor derivation (Derive) use generic relation ~:

en:pyre ~ sk:pýriť sa, de:Feuer

Which expresses that the two words on the right are somehow related with pyre yet it doesn't say why nor how.

Comments

If you have an idea on "how" or "why" of the relationship or meaning of word, you can use a comment. You put the comment immediately after a word or the sign of a relationship:

boat [means of transport on water]
en:pyre ~[person turning red in the face] sk:pýriť sa

If you want to have multiple comments, use multiple statements or put the comments behind each other:

boat [similar to a ship]
boat [means of transport on water]

a [b] [c]               // is valid as well

Unions

To express that one word is a composite of two other words, use an union ({ first_word + ... + last_word }):

sa:vi+sa {vi+sa}

This will result in creating a word sa:visa and creating a corresponding union object for it.

You can have any number of components, e.g.

sa:svetAsvatara {sveta+asva+tara}

The components are by default in the language of the union. But you can also explicitly set the language for any of them:

slavic:Dažbog {sa:dadati+bog}

And finally you can combine comments with union statements:

slavic:Dažbog {sa:dadati+bog} [solar deity]

Tools

Above was the description of the grammar. It is implemented in lexer.py and parser.py, both of which can be used with a file as argument for testing.

use parser.py::load_db() to load the DB. The parser will instantiate the objects and create relationships in memory according to model.py. Once this is done you can work on the object model, either with your tools or using the attached ones:

dump.py

To produce some stats about your DB, use dump.py.

details.py

details.py shows the details of a word in a language.

dict.py

dict.py as seen above lists all the words for a given language along with their derivations.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
db		db
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
TODO.txt		TODO.txt
basic.pill		basic.pill
config.py		config.py
details.py		details.py
dict.py		dict.py
dump.py		dump.py
lexer.py		lexer.py
log.py		log.py
model.py		model.py
parser.py		parser.py
requirements.txt		requirements.txt
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

etymolog

Pill File Grammar

Words

Composite words

Groups

Relationships

Derivation

Equality

Generic Relation

Comments

Unions

Meta

EOL Comments

Embedded comments

Processing Directives

Tags

Tools

dump.py

details.py

dict.py

About

Releases

Packages

Languages

uzak/etymolog

Folders and files

Latest commit

History

Repository files navigation

etymolog

Pill File Grammar

Words

Composite words

Groups

Relationships

Derivation

Equality

Generic Relation

Comments

Unions

Meta

EOL Comments

Embedded comments

Processing Directives

Tags

Tools

dump.py

details.py

dict.py

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages