Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could not get any bookmark #35

Open
1925381584 opened this issue Sep 20, 2024 · 15 comments
Open

could not get any bookmark #35

1925381584 opened this issue Sep 20, 2024 · 15 comments

Comments

@1925381584
Copy link

1925381584 commented Sep 20, 2024

hi, I could not get any bookmark in ebuku,but buku has import some bookmark. When I output this command "ebuku", the error message is "Invalid string for collation: Invalid argument". Here is my configuration.

OS: Windows 10.0.19045
emacs version : 30.0.50
buku: 4.9
ebuku: 2024.09.05

@flexibeast
Copy link
Owner

This reminds me of #31, which the user fixed by ensuring that the value of the LC_ALL environment variable was changed from C to the appropriate locale (in that case, zh_CN.UTF-8, as described here).

What's the value of LC_ALL in Emacs' environment on your system? You can find that information via e.g. M-x list-environment.

@1925381584
Copy link
Author

there is no a command call "list-environment" in my emacs. And I could not find the viable LC_ALL too.

@flexibeast
Copy link
Owner

My apologies. Can you instead please evaluate:

(seq-filter #'(lambda (v) (numberp (string-match "^LC" v))) process-environment)

and report the output?

@1925381584
Copy link
Author

the output is that

image

@flexibeast
Copy link
Owner

Okay, so LC_ALL is set appropriately.

Could you please do M-x toggle-debug-on-error, and then try to start Ebuku? It should result in a buffer showing what commands/functions got called; could you please share the contents of that buffer?

@1925381584
Copy link
Author

1925381584 commented Sep 21, 2024

here is

image

@1925381584
Copy link
Author

this is my configure

image

@flexibeast
Copy link
Owner

Thank you - i'll investigate this and get back to you.

@flexibeast
Copy link
Owner

In the second line of the backtrace - the one that starts with #<subr string-collate-lessp ... - there are two bookmark tags being compared in order to sort them correctly. However, it appears that the tags have been saved in the buku database with different encodings; presumably the second is UTF-8, as it's rendering correctly, but the first one is showing the raw bytes (in octal), and i'm not sure what encoding it might be..

Can you please copy-and-paste the two tags into two new and separate files, each tag in their own file, and then open up each of those files in Emacs, calling C-h v buffer-file-coding-system in each buffer, and sharing the results?

@1925381584
Copy link
Author

1925381584 commented Sep 21, 2024

Now I only import one bookmark,but still getting this error.

image

image

image

@flexibeast
Copy link
Owner

But you're importing one bookmark into a pre-existing buku database, correct? If so, then there's still the issue of comparing pre-existing tags with the tag(s) of the bookmark being imported. So, please follow the instructions i provided in my previous comment, and share the results.

@1925381584
Copy link
Author

I’m sorry I don't know how to copy-and-paste the two tags into two new and separate files. when I import the bookmark, I have clean bookmarks. After that I made the changes in the image below and it reads successfully. It looks like there is a problem parsing the Chinese language.

image

image

image

@flexibeast
Copy link
Owner

flexibeast commented Sep 22, 2024

It's clearly not a problem with handling Chinese per se, for two reasons:

It's okay if you don't understand how to do something i ask of you, but in that case, please ask for further instructions. As the developer of this software, i can't help you if you don't provide me with the information i need.

To copy and paste text:

  • Move point / the cursor to the start of the text you want to copy.
  • Press C-SPC.
  • Move point / the cursor to the end of the text you want to copy.
  • Press M-w.

That will copy the text to the 'kill-ring' / 'clipboard'.

To paste text:

  • Move point / the cursor to where you want to paste the text.
  • Press C-y.

@1925381584
Copy link
Author

Thank you for your answer.I did so as you asked and did find something new. First I created two new buffers with notepad++ and put the respective text in them and saved them. Then I opened them in emacs. Their encoding is different as shown below.

image
image
image
image

But I'm not quite sure if this difference means the db in buku is different, because I looked at the database in sqlite through the tool, and found that the Chinese are all displayed properly, and they are all in utf8 encoding.

image
image
image
image

So I guess there are two possible reasons, the first one could be that the encoding in buku is different, but it doesn't show it. The second middle possibility is that there is a problem with parsing Chinese in ebuku.

@flexibeast
Copy link
Owner

The issue seems to be that Emacs is sometimes incorrectly guessing the encoding as undecided-dos, as in your first screenshot, rather than UTF-8. Ebuku uses Emacs' built-in call-process to retrieve data from the buku database - refer to this part of the Ebuku code, where it calls buku and inserts the resulting output in a temporary buffer. It's Emacs, not Ebuku, that guesses the encoding of the buffer.

Please read through this discussion on #32, in which, as i noted above, the user wasn't having problems with Chinese in Ebuku in general, but only when also using certain emoji. Emacs maintainer Eli Zaretskii is part of that discussion, and he noted that using UTF-8 on Windows machines is problematic:

[T]he user sets a UTF-8 locale, which as I wrote up-thread is not a good idea on MS-Windows. It could well cause failures in invoking external programs from Emacs, if the arguments to those programs include non-ASCII characters. In general, on MS-Windows Emacs can only safely invoke programs with non-ASCII characters in the command-line arguments if those characters can be encoded by the system codepage, in this case codepage-936 AFAIU.
...
Emacs on MS-Windows cannot use UTF-8 when encoding command-line arguments for sub-programs, it can only use the system codepage. Using set-language-environment as above will force Emacs to encode command-line arguments in UTF-8, which could very well be the reason for some of these problems.
...
[Setting the language environment to "UTF-8" is] NOT RECOMMENDED!

Unfortunately, that discussion wasn't resolved because the user has never responded to Eli's most recent comment. However, in this case, you've reported that the value of buffer-file-coding-system is undecided-dos when it comes to some of the Chinese text in your buku database, and this was some of the information Eli was seeking from the other user. So i'm going to cc him on this discussion, as he might be able to assist further.

@Eli-Zaretskii

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants