Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese transliteration / romanization #46

Open
cryptoquick opened this issue Jan 14, 2015 · 7 comments
Open

Japanese transliteration / romanization #46

cryptoquick opened this issue Jan 14, 2015 · 7 comments

Comments

@cryptoquick
Copy link

Japanese can be transliterated or romanized pretty effectively with software. That would be a killer feature for my future needs, as my company has several sites will include URLs for stores that have Japanese titles.

@pid
Copy link
Owner

pid commented Jan 20, 2015

I took a look on the issue -> "This chart shows in full the three main systems for the romanization of Japanese: Hepburn, Nihon-shiki and Kunrei-shiki:"
What's the way to go here? Three types of transliteration? Is there a focus on one?

Is this a good choice? -> http://www.translitteration.com/transliteration/en/japanese/iso-3602-kunrei-shiki/

@cryptoquick
Copy link
Author

I've been experimenting with this, and Hepburn is definitely the way to go, but unfortunately, there's two cases we'd need to detect for here.

Hepburn can detect if the string needs to be altered with containsKana. Then, using fromKana, we get the desired Romajii.

However, another problem is that not all Japanese is in kana; there's also Kanji. The best solution I can come up with for that is ENAMDICT.

With both, we should have good coverage. I might also be prudent to get the hiragana from ENAMDICT first, then pass that to Hepburn.

We also want to be careful to provide to SpeakingURL the result of these techniques, in case it throws in any other characters with marks/diacriticals/accents/long-vowels that should be converted to ASCII.

@cryptoquick
Copy link
Author

Also, I have this code written for my own slugifier based on SpeakingURL. It may be of some help.

And this is the test, but note, this only works for Kana, not for Kanji. You'll want to get it working for both, and that's the tricky part.

slugify-spec.coffee

  it "should romanize japanese characters", ->
    url_with_japanese = "/store/デパート/12345"
    romanized_url = "/store/depato/12345"
    expect(slugify(url_with_japanese)).toEqual(romanized_url)

slugify.coffee

hepburn = require 'hepburn'
getSlug = require 'speakingurl'

module.exports = (str) ->
  if typeof str isnt 'string'
    throw new Error "Slugify Error: Wrong type passed for value: #{str}"
  else
    if hepburn.containsKana str
      str = hepburn.fromKana str

    str = getSlug str,
      uric: yes
      custom:
        '&': 'and'

    str

@martindale
Copy link

+1 on this issue. I'll need this for soundtrack.io.

@pid
Copy link
Owner

pid commented Feb 28, 2015

Because of lack of time, this issue is still unfinished.
Still hope that someone will send a pull-request :-) Come on ;-)

@h2non
Copy link

h2non commented Apr 3, 2015

+1

@iliakan
Copy link

iliakan commented Oct 19, 2018

+5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants