Developing new phonemizers

Set up IDE (Recommended: Visual Studio or VS Code)
Compile OpenUtau from source
Begin learning C# from official Microsoft tutorials
Read Phonemizer API
For languages with multi-syllable words, read SyllableBasedPhonemizer API

Heavily commented example implementations, from simplest to most complex:

Phonemizer API

The main method to implement is:

public abstract Phoneme[] Process(Note[] notes, Note? prevNeighbour, Note? nextNeighbour);

notes: A group of notes. The first note contains the lyric. The rest are extender notes whose lyric starts with +.
prevNeighbour and nextNeighbour: Useful info for creating diphones, if applicable. E.g., creating proper leading diphone in VCV.
returns: An array of phonemes, positioned relative to the first note.

Tips:

To load singer specific resouce, Implement resouce loading in SetSinger() and use singer.Location to look for files.
If uses expensive resource, load it lazily when the phonemizer is created the first time. Use your best adjudgement to decide its lifetime.

The API is implemented in OpenUtau.Core/Api/Phonemizer.cs

Phonemizer Development Guidelines

Naming rule

Usually a phonemizer's name is <language> <type> for classic phonemizers, and <renderer> <language> for machine-learning phonemizers.

Renderer is "vogen" or "nnsvs".
Language is the spoken language that the phonemizer sings in, such as "English", "Japanese".
Type is the vb type supported by the phonemizer, such as "CVVC", "VCV".

A phonemizer's tag is the abbreviation of the phonemizer's name. For example, the tag of "English Arpasing Phonemizer" is "EN ARPA"

The language should be abbreviated in programmer's style, such as EN and JA (as in EN-US and JA-JP). See microsoft official documentation for the language code of each language.

Required Features

A complete Phonemizer should:

Produce phonemes from the lyric, and previous / next notes if exist.
Distribute phonemes to positions relative to the first note of each group of notes.
(For Classic phonemizers) support multi-pitch and multi-color voicebanks.

Optional Features

Considering the characteristics of different languages, the phonemizer doesn't necessarily have to implement all the following features. However, implementing these features can maintain a consistent user experience across various phonemizers.

These features can be quickly implemented by inheriting a phonemizer template, such as SyllableBasedPhonemizer.

polysyllabic word support

For polysyllabic languages such as English, it should be supported to input lyrics on the first syllable, use +~ or +* in the following notes to extend the current syllable, and use + to distribute the next syllable.

Phonetic hint

Users can manually enter space-separated phoneme sequences (aka. Phonetic hint) in square brackets, such as read , read[r iy d] and [r iy d]. When both phonetic hint and word exist, the phonetic hint takes precedence.

G2p

G2p (Grapheme to phoneme) can convert lyrics in natural languages to phoneme sequences. OpenUTAU has built-in G2ps for multiple languages, implemented using a machine learning model, which can cover most of the words in the language, and can predict the pronunciation of new words that have not been seen before. Using a unified G2p can make the pronunciation of the same lyrics consistent on different phonemizers.

For languages with a large number of words, and words cannot be converted into phoneme sequences through simple logic, such as English, French, and Russian, please use OpenUTAU's built-in G2p.

The following G2ps are included in OpenUTAU:

English: ArpabetG2P
French: FrenchG2p
German: GermanG2p
Italian: ItalianG2p
Portuguese: PortugueseG2p
Russian: RussianG2p
Spanish: SpanishG2p

Custom pronunciation dictionary

On the basis of G2p, considering that some voicebanks have custom phonemes, a phonemizer should support custom dictionaries. Custom dictionaries can be loaded using G2pDictionary.

No matter which encoding the voicebank uses, yaml dictionaries should always use UTF-8 encoding.

Here is an example of custom dictionary:

%YAML 1.2
---
symbols:
  - {symbol: aa, type: vowel}
  - {symbol: ae, type: vowel}
  - {symbol: ah, type: vowel}
  - {symbol: ao, type: vowel}
  - {symbol: aw, type: vowel}
  - {symbol: ay, type: vowel}
  - {symbol: b, type: stop}
  - {symbol: ch, type: affricate}
  - {symbol: d, type: stop}
  - {symbol: dh, type: fricative}
  - {symbol: eh, type: vowel}
  - {symbol: er, type: vowel}
  - {symbol: ey, type: vowel}
  - {symbol: f, type: fricative}
  - {symbol: g, type: stop}
  - {symbol: hh, type: aspirate}
  - {symbol: ih, type: vowel}
  - {symbol: iy, type: vowel}
  - {symbol: jh, type: affricate}
  - {symbol: k, type: stop}
  - {symbol: l, type: liquid}
  - {symbol: m, type: nasal}
  - {symbol: n, type: nasal}
  - {symbol: ng, type: nasal}
  - {symbol: ow, type: vowel}
  - {symbol: oy, type: vowel}
  - {symbol: p, type: stop}
  - {symbol: r, type: liquid}
  - {symbol: s, type: fricative}
  - {symbol: sh, type: fricative}
  - {symbol: t, type: stop}
  - {symbol: th, type: fricative}
  - {symbol: uh, type: vowel}
  - {symbol: uw, type: vowel}
  - {symbol: v, type: fricative}
  - {symbol: w, type: semivowel}
  - {symbol: y, type: semivowel}
  - {symbol: z, type: fricative}
  - {symbol: zh, type: fricative}
entries:
  - grapheme: openutau
    phonemes: [ow, p, eh, n, w, uw, t, ah, w, uw]

When reading and writing yaml files, OpenUtau always uses yaml 1.2 syntax. If you're developing third-party tools to parse or generate these yaml files, please use ruamel.yaml instead of pyyaml library.

Appendix: Phoneme set of builtin G2Ps

ArpabetG2p (English)

vowels: aa, ae, ah, ao, aw, ay, eh, er, ey, ih, iy, ow, oy, uh, uw
consonants: b, ch, d, dh, f, g, hh, jh, k, l, m, n, ng, p, r, s, sh, t, th, v, w, y, z, zh

GermanG2p

vowels: aa, ae, ah, ao, aw, ax, ay, ee, eh, er, ex, ih, iy, oe, ohh, ooh, oy, ue, uh, uw, yy
consonants: b, cc, ch, d, dh, f, g, hh, jh, k, l, m, n, ng, p, pf, q, r, rr, s, sh, t, th, ts, v, w, x, y, z, zh

ItalianG2p

vowels: a, a1, e, e1, EE, i, i1, o, o1, OO, u, u1
consonants: b, d, dz, dZZ, f, g, JJ, k, l, LL, m, n, nf, ng, p, r, rr, s, SS, t, ts, tSS, v, w, y, z

PortugueseG2p

vowels: a, a~, e, e~, E, i, i~, o, o~, O, u, u~
consonants: b, d, dZ, f, g, j, j~, J, k, l, L, m, n, p, r, R, s, S, t, tS, v, w, w~, X, z, Z

RussianG2p

vowels: a, aa, ay, ee, i, ii, ja, je, jo, ju, oo, u, uj, uu, y, yy
consonants: b, bb, c, ch, d, dd, f, ff, g, gg, h, hh, j, k, kk, l, ll, m, mm, n, nn, p, pp, r, rr, s, sch, sh, ss, t, tt, v, vv, z, zh, zz

SpanishG2p

vowels: a, e, i, o, u
consonants: b, B, ch, d, D, f, g, G, gn, I, k, l, ll, m, n, p, r, rr, s, t, U, w, x, y, Y, z

OpenUtau: a free, open-source editor made for the UTAU community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly