Skip to content

birchill/normal-jp

Repository files navigation

Normal Japan

Various conversion utilities for Japanese text.

toNormalized

Performs the following conversions:

  • half-width katakana to full-width katakana (e.g. ガーデン → ガーデン)
  • decomposed characters to their composed equivalents (e.g. ダイエット → ダイエット)
  • various enclosed characters into their plain form (e.g. ㋕ → カ)
  • various combined characters into their expanded form (e.g. ㌀ → アパート, ㋿ → 令和)
  • variation selector characters are dropped
  • characters encoded using radical codepoints are converted to equivalent kanji codepoints (e.g. ⽂/0x2F42 → 文/0x6587)

and return the mapping from positions in the output string to the input string (using regular character indexing, not fancy codepoint indexing since the APIs we want to use these results with don't know about about surrogate pairs).

katakanaToHiragana

Converts full-width katakana characters to hiragana. It doesn't handle half-width katakana so you should run the input through toNormalized first if you want to handle that.

Note that the length of the output is equal to the length of the input so this function does not returning the mapping from input string character offsets to output string positions.

kyuujitaiToShinjitai

Converts various 旧字体 (kyuujitai, old character forms) to 新字体 (shinjitai, new character forms).

Based on the data in https://en.wikipedia.org/wiki/Kyūjitai but does not handle kyuujitai represented using variation selectors since these are stripped by toNormalized.

As with katakanaToHiragana the length of the input and output is equal so this function does not return the mapping between character offsets.

expandChoon

Expands ー to the various vowels it may represent.

As with katakanaToHiragana the length of the input and output is equal so this function does not return the mapping between character offsets.

moraCount

Counts the number of mora in a hiragana/katakana string, e.g.

  • moraCount('とうきょう') ⇒ 4
  • moraCount('いっぱい') ⇒ 4

moraSubstring

Like String.prototype.substring but takes mora indices instead, e.g.

  • moraSubstring('しゃけ', 0, 1) ⇒ 'しゃ'
  • moraSubstring('しゃけ', 1) ⇒ 'け'

halfToFullWidthNum

Converts half-width numbers to full-width.

  • halfToFullWidthNum('第405号') ⇒ '第405号'

Publishing

pnpm release-it