Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

follow up librime#806, avoid path-string conversion #310

Merged
merged 3 commits into from
Feb 6, 2024

Conversation

lotem
Copy link
Contributor

@lotem lotem commented Feb 6, 2024

Avoid using RimeGet*Dir() from rime_api. They return native path on Windows and coding conversion is incurred.

When passing path from lua to librime class, use either rime::path object or UTF-8 encoded string.

src/types_ext.cc Outdated
@@ -263,6 +263,10 @@ namespace UserDbReg{
return {};
}

string file_name(const T& t) {
return t.file_path().u8string();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected that file_name(db) should return UTF-8 encoded path (instead of CP_ACP on Windows)?

UTF-8 string is appropriate if the returned string path is used to extract schema ID, or passed to librime code as path.

The returned string path might need to be converted to ANSI encoding before given to lua file-system functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, lua uses ANSI encoding path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's make it ANSI encoded.

Note this is the original build error because the C++ class no longer has the getter file_name.

A reminder from the summary of the refactor in librime:

file_name's output, ANSI encoded path, is suitable for accessing the file system using Lua binding to system functions.
If any wants to get schema ID, dictionary ID from the file name, it has to be first converted to UTF-8 string, in order to support path with Chinese characters on Windows.
When constructing a path from schema ID or UTF-8 encoded string read from config, conversion is also needed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ANSI encoded path is subtle, but this is lua's limitation. We need a custom source code of lua if we want UTF-8 encoded path.

@lotem
Copy link
Contributor Author

lotem commented Feb 6, 2024

還有一小部份 ,可否一起

These changes are in my commit. Anything else?

src/types.cc Outdated
@@ -308,7 +308,8 @@ namespace ReverseDbReg {
using T = ReverseDb;

an<T> make(const string &file) {
an<T> db = New<ReverseDb>(string(RimeGetUserDataDir()) + "/" + file);
an<T> db = New<ReverseDb>(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the only necessary change to pass the compilation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the only one.
I got more errors after resolving the first. I'm commenting inline for every change.

src/modules.cc Outdated
@@ -17,8 +17,11 @@ static bool file_exists(const char *fname) noexcept {
}

static void lua_init(lua_State *L) {
const auto user_dir = std::string(RimeGetUserDataDir());
const auto shared_dir = std::string(RimeGetSharedDataDir());
// path::string() returns native encoding on Windows
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RimeGetUserDataDir, RimeGetSharedDataDir are deprecated.
They still work, but less efficient. Because the internal data has changed from string to path, they will do extra conversion and allocating string in static variable.

src/opencc.cc Outdated
opencc::Config config;
converter_ = config.NewFromFile(config_path);
// OpenCC accepts UTF-8 encoded path.
converter_ = config.NewFromFile(config_path.u8string());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes the wrong path encoding given to opencc on Windows.
Opencc does it UTF-8 to ANSI string conversion internally.

}

string get_shared_data_dir() {
RimeApi* rime = rime_get_api();
return string(rime->get_shared_data_dir());
Copy link
Contributor Author

@lotem lotem Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, RimeApi::get_shared_data_dir is deprecated because its inefficiency. Internally it does the same as what the line has changed into, plus assigning the string to a static local variable, then return its c_str(). The C-string is constructed back into another string object in the original code. This string allocation is unnecessary but was needed simply because the original librime API was written in C.

Avoid using `RimeGet*Dir()` from `rime_api`. They return native path on
Windows and coding conversion is incurred.

When passing path from lua to librime class, use either rime::path
object or UTF-8 encoded string.
@lotem
Copy link
Contributor Author

lotem commented Feb 6, 2024

Changed file_name(db) to return ANSI string.
PTAL

@hchunhui
Copy link
Owner

hchunhui commented Feb 6, 2024

LGTM. Added some code to ensure it can build with librime <= 1.9.0

@hchunhui hchunhui merged commit cf2c572 into hchunhui:master Feb 6, 2024
10 checks passed
lotem added a commit to rime/librime that referenced this pull request Feb 8, 2024
graphemecluster pushed a commit to TypeDuck-HK/librime that referenced this pull request Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants