Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add localization cli command #10187

Merged
merged 2 commits into from
Feb 11, 2022
Merged

Add localization cli command #10187

merged 2 commits into from
Feb 11, 2022

Conversation

msujew
Copy link
Member

@msujew msujew commented Sep 29, 2021

What it does

Closes #9708

Uses the machine translation API of DeepL to automatically translate json files.

How to test

Testing this PR requires a DeepL API key. While 500k characters/month are free for developers, even a free account requires a credit card (claiming to reduce the amount of multi-accounts that abuse the free dev-tier). If you want to review this PR but don't have access to an account already, or are hesitant to use your own credit card there, you can send me a mail (linked in my github profile) and I'll send you an API key.

  1. Create a sample file that you want to translate:
{
  "test": "This is a value",
  "nested": {
    "hi": "Hello",
    "deeper": {
      "why": "Why"
    }
  }
}
  1. Run the cli:
yarn theia localize -f test.json -k <api-key> (--free-api) <language codes like de, it, fr>
  1. Assert that the output files are correctly translated (they will be in the same directory as the source file)
  2. Make use of different arguments, or play around with the help of the localize command
  3. Change the sample file and assert that only new entries which can't be found in the already translated files are actually translated (the cli will output the amount of translated entries)

Review checklist

Reminder for reviewers

@msujew msujew added the localization issues related to localization/internalization/nls label Sep 29, 2021
@msujew msujew mentioned this pull request Sep 29, 2021
8 tasks
@msujew msujew force-pushed the msujew/localization-cli branch 2 times, most recently from 5b99efd to 9decf1e Compare October 5, 2021 08:21
@msujew msujew added the theia-cli issues related to the theia-cli label Oct 8, 2021
Copy link
Contributor

@jbicker jbicker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Works like a charm.

@msujew msujew force-pushed the msujew/localization-cli branch from 9decf1e to b81080a Compare October 21, 2021 10:56
Copy link
Member

@vince-fugnitto vince-fugnitto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianking @marcdumais-work just as a precaution, I'd like your input about using deepl (which is a free or paid service that requires an api key and credit card) in order to perform translations. Is there anything from the eclipse foundation point of view which discourages doing such things?

Copy link
Member

@vince-fugnitto vince-fugnitto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msujew can you clarify why we want to translate json files exclusively?

@@ -31,6 +31,7 @@
"dependencies": {
"@theia/application-manager": "1.18.0",
"@theia/application-package": "1.18.0",
"@theia/localization-manager": "1.18.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msujew the use of deepl in the extension makes me think once again how we should write @theia/cli in a way that we can contribute functionality or scripts to it, without the need to depend on extensions explicitly. Not only will it help with coupling, but applications would have more control over what functionality they want to include in their products, especially for a case like deepl which they might no want to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that makes sense. I'll leave this PR out of the merging round I am planning for today for the localization PRs, as it's not absolutely necessary for localization support of Theia.

@msujew
Copy link
Member Author

msujew commented Oct 21, 2021

can you clarify why we want to translate json files exclusively?

@vince-fugnitto There no real reason to support json files exclusively. However, as the output format from the extract command (#10247) is aligned with the one used by vscode, which also produces json files, I expected them to be used in conjunction with this localize command. Additionally, as the output of the localize command is to be used with the LocalizationContribution, using json just seems natural.

Nonetheless, I am open to the idea to add additional input and output formats.

@vince-fugnitto
Copy link
Member

can you clarify why we want to translate json files exclusively?

@vince-fugnitto There no real reason to support json files exclusively. However, as the output format from the extract command (#10247) is aligned with the one used by vscode, which also produces json files, I expected them to be used in conjunction with this localize command. Additionally, as the output of the localize command is to be used with the LocalizationContribution, using json just seems natural.

Nonetheless, I am open to the idea to add additional input and output formats.

@msujew sorry, I had understood that it only translated .json files from the pull-request description. I think then it raises some additional concerns over the deepl service, we would not want to pass proprietary code (such as a custom @theia extension) to the service.

@msujew
Copy link
Member Author

msujew commented Oct 21, 2021

we would not want to pass proprietary code (such as a custom @Theia extension) to the service.

We don't. The extract command will perform all necessary preprocessing to write a simple json file to the file system, which only contains the translation IDs used and their (english) default values. Later, the localize command (using deepl) takes that json file (which does not contain any code) and translates the default values into the specified language.

@vince-fugnitto
Copy link
Member

We don't. The extract command will perform all necessary preprocessing to write a simple json file to the file system, which only contains the translation IDs used and their (english) default values. Later, the localize command (using deepl) takes that json file (which does not contain any code) and translates the default values into the specified language.

@msujew unfortunately even something as simple as ids from proprietary extensions can reference proprietary components and so-on, and I can think of many use-cases where it would not be acceptable by companies :(

@msujew
Copy link
Member Author

msujew commented Oct 21, 2021

unfortunately even something as simple as ids from proprietary extensions can reference proprietary components

@vince-fugnitto Actually, the only info that's sent to deepl is the default value. The mapping to the id only happens in the LocalizationManager instance.

Anyway, the main idea behind the deepl integration was to easily translate Theia itself (#9708). While it can also be used by downstream users, they are free to use other services. Furthermore - at least for paying customers - deepl promises high confidentiality for the data sent to them.

@msujew msujew force-pushed the msujew/localization-cli branch from b81080a to 50daeb0 Compare December 9, 2021 13:11
@msujew
Copy link
Member Author

msujew commented Dec 9, 2021

@vince-fugnitto The DeepL related CQ has been approved. I rebased my changes for further reviewing.

@vince-fugnitto vince-fugnitto dismissed their stale review December 14, 2021 17:57

CQ has been approved.

Copy link
Member

@vince-fugnitto vince-fugnitto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msujew can you describe a bit how the deepl integration is supposed to work for theia translations (perhaps in the readme of the extension). It's not instantly clear to me what it does or how it works. Some insights on how translations of theia should work, and if we should maintain some type of theia nls.metada.json in the framework for our own translations.

dev-packages/localization-manager/README.md Outdated Show resolved Hide resolved
@msujew msujew force-pushed the msujew/localization-cli branch from 50daeb0 to 8c35d41 Compare January 5, 2022 16:44
@msujew
Copy link
Member Author

msujew commented Jan 5, 2022

@vince-fugnitto I expanded the readme and added an example. I would refrain from adding more documentation in there, instead choosing to dedicate a larger section to it in the internationalization documentation. Regarding, a nls.metadata.json for Theia itself, I would rather document this in the coding guidelines. What do you think about that?

@vince-fugnitto
Copy link
Member

@vince-fugnitto I expanded the readme and added an example. I would refrain from adding more documentation in there, instead choosing to dedicate a larger section to it in the internationalization documentation. Regarding, a nls.metadata.json for Theia itself, I would rather document this in the coding guidelines. What do you think about that?

@msujew I'd be fine with whatever you propose :) the important thing is we have some form of documentation to help developers, and have a clear way forward.

@JonasHelming
Copy link
Contributor

@vince-fugnitto : Are you fine with merging this?

Uses the machine translation API of DeepL to automatically translate any missing values
Uses the machine translation API of DeepL to automatically translate any missing values
@msujew msujew force-pushed the msujew/localization-cli branch from 8c35d41 to 0a4da65 Compare January 26, 2022 15:29
Copy link
Member

@vince-fugnitto vince-fugnitto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @msujew I don't have any further comments regarding the code 👍

@msujew
Copy link
Member Author

msujew commented Jan 27, 2022

@vince-fugnitto Thanks, I'll merge it after the release 👍

@JonasHelming
Copy link
Contributor

@msujew : Release is done, "ping" :-)

@JonasHelming
Copy link
Contributor

@msujew Can this be merged?

@msujew msujew merged commit 8b50540 into master Feb 11, 2022
@github-actions github-actions bot added this to the 1.23.0 milestone Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
localization issues related to localization/internalization/nls theia-cli issues related to the theia-cli
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use a translation service for Theia's own translations
4 participants