Skip to content

Tatoeba/tatomecab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tatomecab

A wrapper around mecab for the Tatoeba project.

Tatomecab is of a set of tools to provide Japanese sentences with furiganas.

tatomecab.py

A library that wraps Mecab and add some more features (like parsing markers set by warifuri). It can also be used as a command line to do quick testing like mecab:

$ echo 振り仮名をつけろう | ./tatomecab.py
振	ふ
り	None
仮	が
名	な
を	None
つけろ	None
う	None

webserver.py

Exposes the tatomecab library as a webservice.

$ curl http://127.0.0.1:8842/furigana -G --data-urlencode str=振り仮名をつけろう
# Actual URL is http://127.0.0.1:8842/furigana?str=%E6%8C%AF%E3%82%8A%E4%BB%AE%E5%90%8D%E3%82%92%E3%81%A4%E3%81%91%E3%82%8D%E3%81%86
<?xml version="1.0" encoding="UTF-8"?>
<root>
<parse>
<token>
  <reading furigana=""><![CDATA[]]></reading>
  <![CDATA[]]>
  <reading furigana=""><![CDATA[]]></reading>
  <reading furigana=""><![CDATA[]]></reading>
</token>
<token><![CDATA[]]></token>
<token><![CDATA[つけろ]]></token>
<token><![CDATA[]]></token>
</parse>
</root>

Warifuri

Warifuri is a script that edits mecab dictionary to insert markers in the reading field so that furigana(s) are mapped to the character(s) they belong to, enabling proper mono ruby and group ruby.

About

A wrapper around mecab for the Tatoeba project (https://tatoeba.org/)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published