Unicode UTF-8 Identifiers in USD Proposal #1

erslavin · 2023-02-28T17:27:51Z

Description of Proposal

Currently, identifiers in USD are limited to ASCII character sets. As such, Unicode based languages cannot use their native character sets to name prims and properties. This proposal discusses what it would take to add Unicode support to core USD identifiers.

Supporting Materials

PixarAnimationStudios/OpenUSD#2120

Contributing

I agree to and accept the Supplemental Terms.

asluk

Overall looks good! I'll give it a more thorough review tomorrow, but in the meantime it will be good to have a separate "out of scope" section to clarify that while AEC content often has whitespace, parentheses, etc., display names ARE the right approach there, i.e., like in C++ and Python, it does make sense to draw some reasonable limitations around identifier syntax in USD, even if it means not supporting everything that AECO DCCs support. Thanks!

asluk · 2023-02-28T20:07:01Z

proposals/tf_utf8_identifiers/README.md

+
+Aaron Luk  
+Matthew Kuruc  
+Edward Slavin  


if you're comfortable with it, feel free to put yourself as primary author, @erslavin -- I feel you've done the most heavy lifting with this effort.

@erslavin -- sounds like in general NV is using the term AECO (architecture, engineering, construction, and operations) rather than AEC-- let's reflect that here as well. Thanks!

made the adjustment to AECO

nvmkuruc

I left some comments in line and have two main notes about topics that might be able to be streamlined.

I think the proposal does not need to spend a lot of time addressing whether or not prim names should be restricted to identifiers. It's definitely notable, but there's compelling reasons to keep it restricted too. Rather than expand the section to include that debate, it's probably better to keep it focused on what's needed for UTF-8 Identifier Support.

Secondly, I think the proposal should acknowledge collation / UCA but ultimately not spend a lot of time on it. The documentation you've provided on collation is really good and should be preserved perhaps in a sibling markdown file, but ultimately if the core is going to use optimistic ASCII for its internal ordering / sorting, it's secondary to the main proposal.

nvmkuruc · 2023-02-28T18:54:05Z

proposals/tf_utf8_identifiers/README.md

+
+# Staged Rollout and Adoption
+
+Since there are potentally large code bases in the current ecosystem that rely on USD, this proposal currently lands changes behind a `TfEnvSetting` (`TF_UTF8_IDENTIFIERS`) such that Pixar and other VFX/animation stakeholders are not forced to adopt UTF-8 identifier support within their code bases right away.  This environment setting defaults to `false` such that current ASCII validation rules are used in USD core as they are today.  When this setting is `true`, access is provided to the new Unicode based validation rules.  This also allows us to measure the performance impact of different approaches to things such as normalization and collation.  This proposal seeks to phase out this environment setting over time, similar to the deprecation of Ar 1.0, to give all USD citizens time to adapt their pipelines to support UTF-8 identifiers accordingly.


It might be worth considering a CMAKE flag instead of the environment variable given that you use the Ar rollout as analogy. The environment variable might be harder to deprecate?

It's also worth noting that it's probably important to get the sorting algorithm right since it's hard to change given the concerns about diff-ing files. To ensure that layers are easily interchanged and that UTF-8 supported widely, I'd argue that the USD library should be opinionated about when the feature is considered a preview and when it's considered a first class feature. Perhaps TF_PREVIEW_UTF8_IDENTIFIERS or (if you accept the CMAKE suggestion), PXR_PREVIEW_UTF8_IDENTIFIERS.

The CMAKE flag is an interesting option, and would alleviate their concerns about a runtime environment switch. I can put this as an option under this section and state that while we implemented it in the PR as a runtime switch, the CMAKE flag provides some additional flexibility and eliminates the runtime cost of evaluating the flag.

Added additional note at end of section to discuss a CMAKE flag approach

nvmkuruc · 2023-02-28T18:57:27Z

proposals/tf_utf8_identifiers/README.md

+- UTF-8
+- UTF-16 / UTF-32
+
+In evaluating options to support, raw code points rely largely on the support of code pages and individual system locales.  UTF-x encodings are already populate for standardized interchange of Unicode text and in particular UTF-8 encoding is widespread within the community.  UTF-8 is a variable length encoding that is backward compatible with existing ASCII encodings, meaning that current usd files, code, etc. that rely on ASCII strings should work without any changes.  The reader is referred to [UTF-8 Everywhere](http://utf8everywhere.org/) as a summary of why UTF-8 was chosen as the preferred encoding over UTF-16 / UTF-32.


The UTF-8 Everywhere doc was a very compelling read, and I'd argue for its inclusion earlier in the document to seed context for the rest of the proposal.

Added a reference to this doc early on after the Unicode standard itself.

nvmkuruc · 2023-02-28T19:09:23Z

proposals/tf_utf8_identifiers/README.md

+
+Though support may vary, both the C++ standard and PEP 3131 have common definitions for what denotes an identifier and the recommendation is to follow this convention in USD, particularly since it must conform to these rules to enable successful compilation of code-based schemas.  One downside of this approach is that strings starting with digits (or indeed, all digit strings) could not be considered valid identifiers, but this no more restrictive than it is in today's implementation, is a reasonable restriction given that USD supports code-based schemas, and still allows for Unicode characters falling in those two classes (`XID_Start` / `XID_Continue`) to be supported.
+
+Restricting prim names in this way, however, is largerly unnecessary given they are never used by compilers generating schema code.  In fact, it is entirely possible prim names could be any valid Unicode string (e.g., parentheses, spaces, emojis, etc.).  While this proposal does not specify the possible characters that could potentially make up a valid prim name, it does suggest a way to separate out the validation of identifiers and the validation of prim names such that it could be changed in the future.  For now, this proposal uses the same rule set for what makes a valid identifier and valid prim name.  If in the future the rules for validation of prim names change, a balance must be maintained between permissiveness and context-free parsability of a string.


It's interesting and important to note that prim names don't necessarily need to be restricted to identifiers in the same way that properties do. Mentioning emojis and other characters as possible prim identifiers might distract from the current proposal. I might argue for keeping this higher level and noting that there might be opportunities to extend the character set for prims in future proposals?

I can see your point. On the other hand, this gets people thinking about the possibilities of native character sets in the names. I think there is an argument to be made for both restricting and expanding the set of characters allowed. The emojis might be a little distracting though :)

Removed emojis and added a NOTE to the beginning to illustrate the discussion nature of the paragraph

nvmkuruc · 2023-02-28T19:16:11Z

proposals/tf_utf8_identifiers/README.md

+- `xid_start_range_class`: A vector of pairs (`low`, `high`) of code points that fall in the `XID_Start` character class
+- `xid_continue_range_class`: A vector of pairs (`low`, `high`) of code points that fall in the `XID_Continue` character class
+
+This preprocessed file would then be compiled with the rest of the code and be available for fast lookup at runtime.  Preprocessing is necessary becuase:


Spelling error. "because"

nvmkuruc · 2023-02-28T19:16:44Z

proposals/tf_utf8_identifiers/README.md

+This preprocessed file would then be compiled with the rest of the code and be available for fast lookup at runtime.  Preprocessing is necessary becuase:
+
+- The Unicode database is not formatted in such a way to provide fast run-time lookups of character classes in a memory efficient manner
+- The Unicode database is large, and not all information is required to perform the necessary processing


Have we profiled a memory impact from the Unicode database?

We have not in any recordable sense, though we did take a look at the library size between the versions that did and did not have the compile time tables to support the lookups. The database itself is preprocessed, so the memory overhead is that used to store the various maps we generate and compile into the library.

I didn't add additional descriptive text here since we haven't really documented the increase in size

nvmkuruc · 2023-02-28T20:01:24Z

proposals/tf_utf8_identifiers/README.md

+
+For an optimistic ASCII algorithm to be used in practice, it would be required to explicit lay out the context-sensitive rules being used by the existing dictionary ordering algorithm.  These rules would then have to be adjusted to allow for what happens when non-ASCII characters "break" the context - are they ignored?  do they contribute a new context?  does the algorithm terminate at that point on the basis of a code point ordering?  The answers to these questions are not straightforward, but need to be explicitly stated such that users of USD can be assured that whatever ordering is used, it can be understood.
+
+## Unicode Collation Algorithm


This is a really good section, but given that "optimistic ASCII" is going to be the sorting algorithm, I wonder if it should be its own referenced file to capture the work and information while keeping the main proposal focused on the changes to USD Core.

Captured in the overall comment below.

nvmkuruc · 2023-02-28T20:08:39Z

proposals/tf_utf8_identifiers/README.md

+
+Autodesk is one of the primary stakeholders currently asking for UTF-8 identifiers, as Revit and 3ds max already support UTF-8 element names.  While Maya does not currently support UTF-8, we anticipate that it will eventually also be in demand, and the VFX/animation community will then be pushed to follow suit in their internal pipelines.  With the opt-in approach outlined above, this proposal aims to give VFX/animation stakeholders the necessary breathing room ahead of time to safely adopt UTF-8 sooner rather than later.
+
+We share concern over UTF-8 identifiers leaking into pipelines that do not yet support them, and stand ready to offer guidance and support to update those use cases.  As most VFX/animation facilities source their content in-house and/or within non-UTF-8 ecosystems, we do not expect UTF-8 identifiers to appear unexpectedly in such pipelines unless intentionally ingested.  IN any case, the `TfEnvSetting` provides a mechanism through which VFX/animation pipelines can safeguard against accidental UTF-8 ingestions while they migrate over a grace period at which point UTF-8 identifiers become the norm in USD.


Should "IN any case" be "In any case"?

nvmkuruc · 2023-02-28T20:18:06Z

proposals/tf_utf8_identifiers/README.md

+
+# An Overview of Identifiers in USD
+
+Identifier syntax is currently enforced in USD by several different mechanisms.  The primary mechanism is the inlined function `TfIsValidIdentifier`, which states that "an identifier is valid if it follows the C/Python identifier convention; that is, it must be at least one charaacter long, must start with a letter or underscore, and must contain only letters, underscores, and numerals".  However, there are several other code paths that validate different flavors of identifiers, including namespaced and variant identifiers, that use similar logic to achieve validation.  In particular, this means that any change to the logic governing core identifier validation in `TfIsValidIdentifier` necessitates change in several different similar validation functions across the `Sdf` module (for example, `SdfPath::IsValidNamespacedIdentifier`).


Should charaacter be character?

This still seems to be there in the latest unless I'm missing an update.

hmm, you are right, must have been a Ctrl+Z malfunction, fixed

nvmkuruc · 2023-02-28T20:22:44Z

proposals/tf_utf8_identifiers/README.md

+
+It is also necessary to define what we mean by *identifier*.  The current implementation mixes two syntactically similar but semantically different concepts:
+
+- **Prim names:** The name of a USD prim, differentiating one sibling prim from another and forming the hierarchical path from which a prim can be referenced


I think this is a good point and worth capturing, but for focus and given that the proposal isn't as this time recommending that prim names extend beyond identifiers, it might be worth condensing or moving to a sibling document.

Captured in the comment below.

nvmkuruc · 2023-02-28T20:35:22Z

proposals/tf_utf8_identifiers/README.md

+
+Aaron Luk  
+Matthew Kuruc  
+Edward Slavin  


erslavin · 2023-03-01T11:46:36Z

I left some comments in line and have two main notes about topics that might be able to be streamlined.

I think the proposal does not need to spend a lot of time addressing whether or not prim names should be restricted to identifiers. It's definitely notable, but there's compelling reasons to keep it restricted too. Rather than expand the section to include that debate, it's probably better to keep it focused on what's needed for UTF-8 Identifier Support.

Secondly, I think the proposal should acknowledge collation / UCA but ultimately not spend a lot of time on it. The documentation you've provided on collation is really good and should be preserved perhaps in a sibling markdown file, but ultimately if the core is going to use optimistic ASCII for its internal ordering / sorting, it's secondary to the main proposal.

This content was from the original proposal where we did actually modify the prim name rules slightly and spent a fair amount of time implementing UCA. I'd like to keep them in there, but it looks like I need to make a clearer statement somewhere that there are no changes to prim name rules planned (it's important, because we did do the work to separate out the validation, one just falls back to the other when there are no changes to the rules). As far as UCA, we never finalized that optimistic ASCII was the way to go (and I have some doubts in its current state). That said, I agree it likely isn't the final choice, so perhaps I preface the whole collation section with the fact that the subsections are different proposals, we still need to decide which route to take (with tradeoffs mentioned in the different sections) and that because of the performance hit and / or effort needed to cache that data, we likely wouldn't go with UCA?

- Addressed review comments

- Removed transparency in diagram for dark backgrounds

nvmkuruc · 2023-03-01T21:39:23Z

proposals/tf_utf8_identifiers/README.md

+- String concatenation - Unicode normalization forms are not closed under string concatenation; specifically at least the area where the strings are joined need to be considered for composition
+  - `TfStringJoin`
+
+It may be interesting to consider the impact of C++ 20's `std::u8string` type for the methods listed above.  Overloads of these methods that take a `std::u8string` parameter would provide very clear semantics to a developer about the treatment of string content.  While the use of `std:u8string` would really only affect the copmiler encoding of string literals, the burden is then placed on the developer to choose explicitly which of the methods to use.


copmiler => compiler

Proofreading is hard :/ fixed! (and thanks!)

nvmkuruc · 2023-03-07T17:09:29Z

proposals/tf_utf8_identifiers/README.md

+
+For lexer-based validation, a new lexer and parser are introduced to specifically capture new identifier rules for both the path lexer and the text file format lexer.  Although it is possible to change the existing lexer / parser in such a way to support UTF-8 encodings and new identifier rules, we thought it may introduce subtle errors across the ecosystem when using the modified lexer / parser under the current ASCII validation rules (e.g., the new lexer / parser accepting more strings during the lex phase that ultimately failed identifier validation logic).
+
+The path lexer is used as the first step in validating prim path strings.  The text file format lexer is used to read and validate the content of text-based usd files (i.e., `.usda`).  Both require that a sequence of characters be recognized as a logical token (and parsed in context) before being validated.  The rules that dictate what lexer sequences are acceptable for identifiers need to be updated accordingly for Unicode based identifiers.  It is desireable that these rules conform to those updated in the code-based validation sequences.  Unfortunately, it is difficult in flex rules to specify the entire set of byte strings staht would encapsulate a character in the e.g., `XID_Start` character class.  To get around this, we propose the lex rule be very permissive (basically any properly encoded UTF-8 character) and specifc validation (prim name or identifier) on the lexed string be done as part of the lexer logic to accept the token prior to parsing.  This does have the unfortunate overhead of performing some validations twice, once at tokenization using the state machine and once adding the token to the tree and checking for identifer validity.


staht => that

nvmkuruc · 2023-03-07T17:19:59Z

proposals/tf_utf8_identifiers/README.md

+    - [Core Identifier Validation](#core-identifier-validation)
+    - [The Path and Text File Format Lexer / Parser](#the-path-and-text-file-format-lexer--parser)
+    - [Case Mapping and other String Operations](#case-mapping-and-other-string-operations)
+    - [Identifier Collation](#identifier-collation)


Should this be Identifier Sorting?

nvmkuruc · 2023-03-07T17:22:08Z

proposals/tf_utf8_identifiers/README.md

+
+In evaluating options to support, raw code points rely largely on the support of code pages and individual system locales.  UTF-x encodings are already populate for standardized interchange of Unicode text and in particular UTF-8 encoding is widespread within the community.  UTF-8 is a variable length encoding that is backward compatible with existing ASCII encodings, meaning that current usd files, code, etc. that rely on ASCII strings should work without any changes.  The reader is referred to [UTF-8 Everywhere](http://utf8everywhere.org/) as a summary of why UTF-8 was chosen as the preferred encoding over UTF-16 / UTF-32.
+
+Building USD is already a complex endevor.  While there are existing libraries that do provide APIs for working with Unicode character sequences, like [ICU](https://icu.unicode.org/), using these libraries would necessitate the acquisition and distribution of this library with the core USD libraries.  Efforts are already underway to eliminate dependencies (e.g., Boost), so we felt that a small enough subset of Unicode string processing could be implemented in USD core without the need for external libraries.  In practice, the choice of collation algorithm is the primary driver of whether or not to use an external library, so decisions on doing so should be deferred until after understanding the collation algorithm that should be used with the addition of Unicode identifiers (see [Identifier Collation](#identifier-collation) for the different proposed options).


Should this be "the choice of sorting algorithm", "understanding the sorting algorithm", and a link to "Identifier Sorting"

… Ordering

…cted and which was not

- Updated proposal to reflect changed default setting to TF_UTF8_IDENTIFIERS runtime switch - Updated to make clear the distinction (and validation rules used) between prim names and Identifiers

Unicode UTF-8 Identifiers in USD Proposal

a9d1233

erslavin requested review from asluk and nvmkuruc February 28, 2023 17:30

asluk reviewed Feb 28, 2023

View reviewed changes

nvmkuruc reviewed Feb 28, 2023

View reviewed changes

erslavin added 3 commits March 1, 2023 15:09

Unicode UTF-8 Identifiers in USD Proposal

beddb87

- Addressed review comments

Unicode UTF-8 Identifiers in USD Proposal

4dc545f

- Removed transparency in diagram for dark backgrounds

Unicode UTF-8 Identifiers in USD Proposal

f5f0d01

nvmkuruc reviewed Mar 1, 2023

View reviewed changes

erslavin added 2 commits March 1, 2023 17:33

Unicode UTF-8 Identifiers in USD Proposal

8ebc464

Unicode UTF-8 Identifiers in USD Proposal

8c69fa1

nvmkuruc reviewed Mar 7, 2023

View reviewed changes

Unicode UTF-8 Identifiers in USD Proposal

8bea048

nvmkuruc reviewed Mar 7, 2023

View reviewed changes

erslavin added 9 commits March 7, 2023 12:22

Unicode UTF-8 Identifiers in USD Proposal

1324a09

Unicode UTF-8 Identifiers in USD Proposal

c2c3805

Unicode UTF-8 Identifiers in USD Proposal

630b84f

Updated to reflect chosen sort algorithm of Standard ASCII Dictionary…

7972d43

… Ordering

Fixed Standard ASCII Dictionary Sort Ordering Header

f1a4061

Fixed typos

b6e4a38

Fixed header for UCA

f1119e4

Added additional notes to make clear which sorting algorithm was sele…

b6c0488

…cted and which was not

Unicode UTF-8 Identifiers

debd296

- Updated proposal to reflect changed default setting to TF_UTF8_IDENTIFIERS runtime switch - Updated to make clear the distinction (and validation rules used) between prim names and Identifiers

asluk merged commit 3ca773f into main Sep 9, 2023


		# Staged Rollout and Adoption

		Since there are potentally large code bases in the current ecosystem that rely on USD, this proposal currently lands changes behind a `TfEnvSetting` (`TF_UTF8_IDENTIFIERS`) such that Pixar and other VFX/animation stakeholders are not forced to adopt UTF-8 identifier support within their code bases right away. This environment setting defaults to `false` such that current ASCII validation rules are used in USD core as they are today. When this setting is `true`, access is provided to the new Unicode based validation rules. This also allows us to measure the performance impact of different approaches to things such as normalization and collation. This proposal seeks to phase out this environment setting over time, similar to the deprecation of Ar 1.0, to give all USD citizens time to adapt their pipelines to support UTF-8 identifiers accordingly.


		Though support may vary, both the C++ standard and PEP 3131 have common definitions for what denotes an identifier and the recommendation is to follow this convention in USD, particularly since it must conform to these rules to enable successful compilation of code-based schemas. One downside of this approach is that strings starting with digits (or indeed, all digit strings) could not be considered valid identifiers, but this no more restrictive than it is in today's implementation, is a reasonable restriction given that USD supports code-based schemas, and still allows for Unicode characters falling in those two classes (`XID_Start` / `XID_Continue`) to be supported.

		Restricting prim names in this way, however, is largerly unnecessary given they are never used by compilers generating schema code. In fact, it is entirely possible prim names could be any valid Unicode string (e.g., parentheses, spaces, emojis, etc.). While this proposal does not specify the possible characters that could potentially make up a valid prim name, it does suggest a way to separate out the validation of identifiers and the validation of prim names such that it could be changed in the future. For now, this proposal uses the same rule set for what makes a valid identifier and valid prim name. If in the future the rules for validation of prim names change, a balance must be maintained between permissiveness and context-free parsability of a string.


		For an optimistic ASCII algorithm to be used in practice, it would be required to explicit lay out the context-sensitive rules being used by the existing dictionary ordering algorithm. These rules would then have to be adjusted to allow for what happens when non-ASCII characters "break" the context - are they ignored? do they contribute a new context? does the algorithm terminate at that point on the basis of a code point ordering? The answers to these questions are not straightforward, but need to be explicitly stated such that users of USD can be assured that whatever ordering is used, it can be understood.

		## Unicode Collation Algorithm


		Autodesk is one of the primary stakeholders currently asking for UTF-8 identifiers, as Revit and 3ds max already support UTF-8 element names. While Maya does not currently support UTF-8, we anticipate that it will eventually also be in demand, and the VFX/animation community will then be pushed to follow suit in their internal pipelines. With the opt-in approach outlined above, this proposal aims to give VFX/animation stakeholders the necessary breathing room ahead of time to safely adopt UTF-8 sooner rather than later.

		We share concern over UTF-8 identifiers leaking into pipelines that do not yet support them, and stand ready to offer guidance and support to update those use cases. As most VFX/animation facilities source their content in-house and/or within non-UTF-8 ecosystems, we do not expect UTF-8 identifiers to appear unexpectedly in such pipelines unless intentionally ingested. IN any case, the `TfEnvSetting` provides a mechanism through which VFX/animation pipelines can safeguard against accidental UTF-8 ingestions while they migrate over a grace period at which point UTF-8 identifiers become the norm in USD.


		# An Overview of Identifiers in USD

		Identifier syntax is currently enforced in USD by several different mechanisms. The primary mechanism is the inlined function `TfIsValidIdentifier`, which states that "an identifier is valid if it follows the C/Python identifier convention; that is, it must be at least one charaacter long, must start with a letter or underscore, and must contain only letters, underscores, and numerals". However, there are several other code paths that validate different flavors of identifiers, including namespaced and variant identifiers, that use similar logic to achieve validation. In particular, this means that any change to the logic governing core identifier validation in `TfIsValidIdentifier` necessitates change in several different similar validation functions across the `Sdf` module (for example, `SdfPath::IsValidNamespacedIdentifier`).


		It is also necessary to define what we mean by identifier. The current implementation mixes two syntactically similar but semantically different concepts:

		- Prim names: The name of a USD prim, differentiating one sibling prim from another and forming the hierarchical path from which a prim can be referenced


		For lexer-based validation, a new lexer and parser are introduced to specifically capture new identifier rules for both the path lexer and the text file format lexer. Although it is possible to change the existing lexer / parser in such a way to support UTF-8 encodings and new identifier rules, we thought it may introduce subtle errors across the ecosystem when using the modified lexer / parser under the current ASCII validation rules (e.g., the new lexer / parser accepting more strings during the lex phase that ultimately failed identifier validation logic).

		The path lexer is used as the first step in validating prim path strings. The text file format lexer is used to read and validate the content of text-based usd files (i.e., `.usda`). Both require that a sequence of characters be recognized as a logical token (and parsed in context) before being validated. The rules that dictate what lexer sequences are acceptable for identifiers need to be updated accordingly for Unicode based identifiers. It is desireable that these rules conform to those updated in the code-based validation sequences. Unfortunately, it is difficult in flex rules to specify the entire set of byte strings staht would encapsulate a character in the e.g., `XID_Start` character class. To get around this, we propose the lex rule be very permissive (basically any properly encoded UTF-8 character) and specifc validation (prim name or identifier) on the lexed string be done as part of the lexer logic to accept the token prior to parsing. This does have the unfortunate overhead of performing some validations twice, once at tokenization using the state machine and once adding the token to the tree and checking for identifer validity.


		In evaluating options to support, raw code points rely largely on the support of code pages and individual system locales. UTF-x encodings are already populate for standardized interchange of Unicode text and in particular UTF-8 encoding is widespread within the community. UTF-8 is a variable length encoding that is backward compatible with existing ASCII encodings, meaning that current usd files, code, etc. that rely on ASCII strings should work without any changes. The reader is referred to [UTF-8 Everywhere](http://utf8everywhere.org/) as a summary of why UTF-8 was chosen as the preferred encoding over UTF-16 / UTF-32.

		Building USD is already a complex endevor. While there are existing libraries that do provide APIs for working with Unicode character sequences, like [ICU](https://icu.unicode.org/), using these libraries would necessitate the acquisition and distribution of this library with the core USD libraries. Efforts are already underway to eliminate dependencies (e.g., Boost), so we felt that a small enough subset of Unicode string processing could be implemented in USD core without the need for external libraries. In practice, the choice of collation algorithm is the primary driver of whether or not to use an external library, so decisions on doing so should be deferred until after understanding the collation algorithm that should be used with the addition of Unicode identifiers (see [Identifier Collation](#identifier-collation) for the different proposed options).

Unicode UTF-8 Identifiers in USD Proposal #1

Unicode UTF-8 Identifiers in USD Proposal #1

Conversation

erslavin commented Feb 28, 2023

Description of Proposal

Supporting Materials

Contributing

asluk left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asluk Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvmkuruc left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erslavin commented Mar 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asluk left a comment •

edited

Loading

asluk Feb 28, 2023 •

edited

Loading

nvmkuruc left a comment •

edited

Loading