This transcoder allows you to convert texts that are in Beta Code to UTF-8 and back. Its primary purpose is to support working with Ancient Greek datasets that used Beta Code as their encoding.
This project owes a great deal to github.com/matgrioni/betacode, which is an analogous transcoder written in python. Portions of this transcoder are translations of that project into Go, and the goals of our projects are very nearly the same. Thanks, Matias!
go get github.com/jllovet/betacode-utf8-transcoder
The most frequent uses you'll have for this package are converting text between betacode and utf8 using BetaToUni
and UniToBeta
.
// BetaToUni(beta string) (uni string, err error)
package main
import (
"fmt"
"log"
"github.com/jllovet/betacode-utf8-transcoder"
)
func main() {
b := `a)/lfa`
u, err := transcoder.BetaToUni(b)
if err != nil {
log.Fatal(err)
}
fmt.Println(b, "becomes", u)
}
> go run main.go
> a)/lfa becomes ἄλφα
// UniToBeta(uni string) (beta string, err error)
package main
import (
"fmt"
"log"
"github.com/jllovet/betacode-utf8-transcoder"
)
func main() {
u := `ἄλφα`
b, err := transcoder.UniToBeta(u)
if err != nil {
log.Fatal(err)
}
fmt.Println(u, "becomes", b)
}
> go run main.go
> ἄλφα becomes a)/lfa
If this project helped you with a project of yours, I'd love if you threw a coffee my way to fuel enhancements and similar projects in the future.
- Joel on Software - The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
- What is Transcoding?
- Wikipedia - Beta_Code
- TLGU - BCM.pdf
- TLGU - quickbeta.pdf
- YouTube - Unicode Normalization for NLP in Python - James Briggs
- Medium - What on Earth is Unicode Normalization? - James Briggs
- YouTube - Practical Serialization In Go: Unicode Normalization - Ardan Labs
-
The Go Blog - Strings, bytes, runes and characters in Go
- Go source code is always UTF-8.
- A string holds arbitrary bytes.
- A string literal, absent byte-level escapes, always holds valid UTF-8 sequences.
- Those sequences represent Unicode code points, called runes.
- No guarantee is made in Go that characters in strings are normalized.
- YouTube - Data Structures in Golang - The trie data structure - Junmin Lee
- YouTube - HackerRank - Data Structures: Tries
- YouTube - Jacob Sorber - The Trie Data Structure (Prefix Tree)
- YouTube - Implement Trie | Leetcode #208 - Techn Dose
- YouTube - Tech Dose - Trie Playlist
- Wikipedia - Trie
- Medium - Vaidehi Joshi - Trying to Understand Tries
- Stack Overflow - Ukkonen's suffix tree algorithm in plain English
- Suffix Trees and Their Applications - Bálint Márk Vásárhelyi
- Wikipedia - Longest Common Prefix Array
- Suffix Arrays - A Programming Contest Approach - Adrian Vladu and Cosmin Negruşeri
- Simple Linear Work Suffix Array Construction - Juha Kärkkäinen and Peter Sanders