Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using isolates instead of embeddings to handle mixed-direction strings #1355

Closed
jbphet opened this issue Nov 15, 2022 · 9 comments
Closed

Comments

@jbphet
Copy link
Contributor

jbphet commented Nov 15, 2022

While working on an issue related to the testing of dynamic strings (#1319), I came across the following information on a page that describes the RLE unicode character:

The "RIGHT-TO-LEFT EMBEDDING" directional formatting character is the classical Unicode method of explicit bidirectional formatting, and as of Unicode 6.3, is being discouraged in favor of RIGHT-TO-LEFT ISOLATE. An "embedding" signals that a piece of text is to be treated as directionally distinct. The text within the scope of the embedding formatting characters is not independent of the surrounding text. Also, characters within an embedding can affect the ordering of characters outside. Unicode 6.3 recognized that directional embeddings usually have too strong an effect on their surroundings and are thus unnecessarily difficult to use.

I know we've run into some problems with embeddings, and I'm wondering if using isolates instead could make things simpler. I'll mark this for developer meeting and perhaps a subgroup can investigate.

@samreid
Copy link
Member

samreid commented Nov 17, 2022

Nov 17 2022 Dev meeting:

@kathy-phet can you please add some specific examples, that depict the problems? And make a rough time investment about how difficult/complex to switch over.

@jonathanolson and @jbphet agreed to take a look. Thanks!

@jbphet
Copy link
Contributor Author

jbphet commented Nov 17, 2022

Here are two examples from the currently live version of Build a Molecule (v1.0.6).

Persian (locale = fa):

image

Moroccan Arabic (locale = ar_MA):

image

In both of these examples, the chemical formulas for the compounds are being rendered in reverse, e.g. O2H instead of H2O. To be clear, I'm not saying that using isolates instead of embeddings will solve this problems - I basically think we shouldn't be adding either to patterns. However, this issue needs some attention to make things work better for RTL languages, and the Unicode spec seems to recommend isolates, so perhaps now would be a good time to try them out and see if they are easier to work with, which is what the spec seems to say.

@jbphet
Copy link
Contributor Author

jbphet commented Nov 17, 2022

Here is a related issue about this problem in BAM specifically. I'm putting it here to make sure these two are linked: phetsims/build-a-molecule#220.

@jbphet
Copy link
Contributor Author

jbphet commented Dec 6, 2022

I asked @kathy-phet about whether I should take the time to fix this fully now, or put some short-term fixes into the upcoming Greenhouse Effect prototype publication. Here is the dialog:

@jbphet
There is an issue with the appearance of chemical compounds in translated patterns for RTL languages. I'll put a screenshot below. @arouinfar and I talked about possibly doing a short-term fix for when the next [Greenhouse Effect'] prototype is published, but the more I think about spending time on that, the less valuable it seems. I think the time would be better spent working on the general solution, since it occurs in other sims (e.g. Build a Molecule). Are you cool with me spending ~1.5 hrs investigating a general fix, and then providing an estimate of how long it would take to fully address the problem?

@kathy-phet
Is this not just a matter of how the translator enters the translation?

@jbphet
Yes and no. This happens currently if they enter any sort of translation at all for a pattern, even a reiteration of the English string. That seems wrong, and is part of the motivation behind the bug that I've logged. So yes, there is something they can do to avoid the issue, but it seems like very bad behavior by our translation system to me.

@kathy-phet
Ok. Thanks. Yes, please spend the time to fix the issue.

So it looks like I'm cleared to spend some initial investigation time on this.

@jbphet jbphet self-assigned this Dec 6, 2022
@jonathanolson
Copy link
Contributor

https://stackoverflow.com/questions/71817022/what-is-the-difference-between-embed-and-isolate-values-in-unicode-bidi-cs looks like an interesting case. They look similar enough. Let me know if you need any assistance!

@chrisklus
Copy link
Contributor

From 12/8/22 dev meeting:

KP: @jbphet and @jonathanolson please collaborate when JB is ready to discuss.

jonathanolson added a commit to phetsims/build-a-molecule that referenced this issue Dec 8, 2022
jonathanolson added a commit to phetsims/phetcommon that referenced this issue Dec 8, 2022
@jonathanolson
Copy link
Contributor

@jbphet the chemical formula wasn't including LTR marks (even though it had LTR text). That's mainly because they weren't composed of translated strings. Wrapping it with these should resolve the issue for the sim. Commits above, can you view?

@jonathanolson jonathanolson self-assigned this Feb 22, 2023
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
jonathanolson added a commit to phetsims/phetcommon that referenced this issue May 15, 2024
@jonathanolson jonathanolson removed their assignment Jun 27, 2024
@jonathanolson
Copy link
Contributor

QA test in phetsims/qa#1113.

@jonathanolson
Copy link
Contributor

Most of this work is handled. Any remaining work will be in phetsims/build-a-molecule#220.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants