UTF8 search support in usdviewq #2890

tylerm-nv · 2023-12-27T20:15:37Z

Description of Change(s)

Normalizes prim and attributes search strings in usdviewq to make searching easier. Users can enter the normalized form of a string, e.g. VII instead of Ⅶ, and they will match.

NFKC applies a compatibility decomposition that NFC does not. NFKC was chosen over other normalization forms to make it easier for users to search for UTF8 identifiers with ASCII representations. Some data in the unicode strings is lost during the NFKC normalization, which is intended since the goal is to have loose matching.

This change supports #2848 but can be merged independently

Fixes Issue(s)

I have verified that all unit tests pass with the proposed changes

I have submitted a signed Contributor License Agreement

tallytalwar · 2024-01-02T21:57:11Z

Filed as internal issue #USD-9121

tallytalwar

Could I also request to also update the PR description to give some overview of why NFKC is chosen over NFC, etc and what will be the shortcomings of the same, like what if someone wants to distinguish between "VII" and "Ⅶ", but with NFKC normalization the 2 will match.

tallytalwar · 2024-01-06T01:33:42Z

pxr/usdImaging/usdviewq/appController.py

@@ -2203,6 +2203,9 @@ def setFrameField(self, frame):

    # Prim/Attribute search functionality =====================================

+    def _normalize_unicode(self, str: str, form = 'NFKC'):


It does make sense for usdview as a client to do this sort of normalization and not in the core USD, but we think it might still make sense to provide in brief in-code comments on why NFKC is being chosen for normalization.

Note that it was discussed with @mati-nvidia and others that at least for identifiers NFKC normalization should be discouraged, and encourage NFC normalization.

@tallytalwar "Loose Matching" is one of the motivations for NFKC normalization
(https://unicode.org/faq/normalization.html, "Why should my program normalize strings?").

It's possible that regular expressions don't fit the definition of "loose matching", and different forms would be best for each.

I don't think we have to necessarily discourage users from NFKC normalization of their strings or having pipelines that enforce NFKC forms. NFKC normalized strings are also NFC normalized.

UTF8 search support in usdviewq

2a2f879

tallytalwar reviewed Jan 6, 2024

View reviewed changes

function comments

bc2aca6

sunyab added the usd-utf8-identifiers Issues/PRs for Unicode Identifiers in USD proposal label Jan 11, 2024

pixar-oss merged commit 768ee98 into PixarAnimationStudios:dev Jan 16, 2024
3 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF8 search support in usdviewq #2890

UTF8 search support in usdviewq #2890

tylerm-nv commented Dec 27, 2023 •

edited

Loading

tallytalwar commented Jan 2, 2024

tallytalwar left a comment

tallytalwar Jan 6, 2024

nvmkuruc Jan 8, 2024

		@@ -2203,6 +2203,9 @@ def setFrameField(self, frame):

		# Prim/Attribute search functionality =====================================

		def _normalize_unicode(self, str: str, form = 'NFKC'):

UTF8 search support in usdviewq #2890

UTF8 search support in usdviewq #2890

Conversation

tylerm-nv commented Dec 27, 2023 • edited Loading

Description of Change(s)

Fixes Issue(s)

tallytalwar commented Jan 2, 2024

tallytalwar left a comment

Choose a reason for hiding this comment

tallytalwar Jan 6, 2024

Choose a reason for hiding this comment

nvmkuruc Jan 8, 2024

Choose a reason for hiding this comment

tylerm-nv commented Dec 27, 2023 •

edited

Loading