Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot search japanese content #20

Open
nhienhuynh opened this issue Sep 28, 2016 · 9 comments
Open

Cannot search japanese content #20

nhienhuynh opened this issue Sep 28, 2016 · 9 comments

Comments

@nhienhuynh
Copy link

Hi,
It work ok with English or non unicode characters but when I try to search Japanese content, it cant return result.

Can anyone help me solution for this?!

@larsvoigt
Copy link
Owner

Hi @nhienhuynh,
at the moment this software doesn't support the Japanese language. Sorry I have no spare time to test it, fix it, implement it. Maybe you want realise this if so I can help you to find in the code the right spots.
Lars

@nhienhuynh
Copy link
Author

Hi Lars,
Thanks for reply quickly!
I want to fix it but I cant find the code, I think problem happen when create index database (levelup). Can you help me to find the code?

Tom

@larsvoigt
Copy link
Owner

Hi Tom,
can you verify if the indexing process failed or the search process failed? Do you have an example book that can you share with me?
Lars

@nhienhuynh
Copy link
Author

Hi Lars,
The search fail but I think the database file encoding problem during index.
I attach the epub file for you, this is korean but it has same issue with japanese.
In this file when I search keyword "King", it's ok but when I search "그가 장난기" it return empty although this content existing.

Tom

KingSHADOW1.zip

@nhienhuynh
Copy link
Author

HI Lars,

Did you have time to take a look for me yet?!

regards,

@larsvoigt
Copy link
Owner

Hi Tom,
I have made a small refactoring of the code base. Then I test it with your korean ebook and it works fine on my side.
Get this:

--------------------------------------------------------------------------
*** epubTitle: King SHADOW 1권 ***
--------------------------------------------------------------------------
*** baseCfi: /6/24[Section0011.xhtml]! ***
*** href: Text/Section0011.xhtml ***
*** cfis: 1 hits
------> /6/24[Section0011.xhtml]!/4/654,/1:6,/1:7
***

if I search "그가 장난기" .

And I have published a new version via npm. Can you please test it if it fixed your problem?
Best
Lars

@nhienhuynh
Copy link
Author

Hi Lars,

Big thank for your help! I am checking and will report to you later

Best regards,
Tom

@nhienhuynh
Copy link
Author

hi Lars,

Here is my report after check:

  1. When search with param "t" (epub title) it return empty result
    When keyword (q) is english, it's ok but when keyword is korean and epub title is korean include space it can't return result
    -----------Error case--------------
    curl -XGET "http://127.0.0.1:8085/search?q=미래&t=King%20SHADOW%201권"
    client request
    Keyword: 미�
    bookTitle: King SHADOW 1�
    Result: []
    ----------Good case--------------
    curl -XGET "http://127.0.1:8085/search?q=King&t=King%20SHADOW%201권"
    client request
    Keyword: King
    bookTitle: King SHADOW 1�
    Result: [{"filename":"KingSHADOW11475051244","epubTitle":"King SHADOW 1권","href":"Text/Section0012.xhtml","baseCfi":"/6/26[Section0012.xhtml]!","id":"Section0012.xhtml","cfis":["/6/26[Section0012.xhtml]!/4/2,/1:6,/1:10"]}]

[{"filename":"KingSHADOW11475051244","epubTitle":"King SHADOW 1권","href":"Text/Section0012.xhtml","baseCfi":"/6/26[Section0012.xhtml]!","id":"Section0012.xhtml","cfis":["/6/26[Section0012.xhtml]!/4/2,/1:6,/1:10"]}]

  1. In the result, mostly people want to get the sentence that include the searched phrase but in current result how can we do that?

Best regards,
Tom

@rudra0713
Copy link

Hi @larsvoigt ,
I am trying to search for some Bangla (Bengali language ) content in an epub. I think there is no problem in indexing the epub but when I make the query for a Bangla word, search returns empty.
As the similar problem was solved for Japanese language, I thought the same solution would also work for Bangla.
Can you please help me ?
regards.
Rudra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants