Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial draft of captions extension to semantic labels proposal #67

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nvmkuruc
Copy link
Contributor

@nvmkuruc nvmkuruc commented Jun 28, 2024

Description of Proposal

The semantic labels schema will add support for labeling subgraphs with tokens from discrete taxonomies.

This proposes a peer schema in the UsdSemantics domain for semantics:captions to capture instances of natural language descriptions.

Link to Rendered Proposal

There is overlap with what the accessibility schema is looking to achieve. We're looking for feedback on if and how these two proposals should align.

Supporting Materials

Contributing

@dgovil
Copy link
Contributor

dgovil commented Jul 2, 2024

Hey @nvmkuruc ,
We have a very similar proposal centered around Accessibility that I've unfortunately been slow to put up as we discuss internally. I wonder if we should discuss the overlap because I think there's applicability here to other use cases.

For example, this is what I would propose for accessibility information

def Mesh "Cube" (
    prepend apiSchemas = ["AccessibilityAPI"]
)
{
    string accessibility:label = "A Cube"
    string accessibility:alternate:default = "This cube is a wonderful looking cube"
    token accessibility:importance:default = "Standard"
    
    string accessibility:alternate:size = "As big as a house"
    string accessibility:importance:size = "Low"
}

This follows standard accessibility forms across multiple accessibility frameworks in multiple browsers and operating systems etc... where Label is the short description and alternate is a longer description should someone want that.

Much like yours, I use namespace:<label|alternate|importance>:<optional purpose>

I would also (in ours) encourage combining use with the proposed language schema so that you could do things like

#usda 1.0
(
    language = "en_ca"
)

def Mesh "Cube" (
    prepend apiSchemas = ["AccessibilityAPI"]
)
{
    string accessibility:label = "A Cube"
    string accessibility:alternate = "This cube is a wonderful looking cube"
    token accessibility:importance = "Standard"
    
    string accessibility:label:lang:fr = "Un cube"
    string accessibility:alternate:lang:fr = "Ce cube est un cube magnifique"
    string accessibility:alternate:lang:fr_ca = "Ce cube est un cube magnifique canadien"
}

I realize that the semantic description and the accessibility description might differ, but the abject schema seems so similar, that I wonder if we shouldn't combine forces.

Without trying to be presumptuous, I think the accessibility schema could cover your needs too with accessibility:semantics:skills for example. Anyway happy to discuss more. I think it could be a meaningful change to USD.

@nvmkuruc
Copy link
Contributor Author

nvmkuruc commented Jul 2, 2024

@dgovil Do you happen to have an example of allowed / preferred values for the importance tokens?

@dgovil
Copy link
Contributor

dgovil commented Jul 2, 2024

Probably just low, standard, high. It's not as common a metadata but it does help prioritize tokens for a system when someone asks with natural language for a description

Comment on lines +50 to +58
def Xform "learning_robot" (
apiSchemas = ["SemanticsCaptionsAPI:skills"]
) {
string semantics:captions:skills.timeSamples = {
0 : "The robot does not know how to dance",
100 : "The robot is learning the box step",
150 : "The robot knows the box step"
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in @dgovil's proposal, we also discussed future consideration for time-based descriptions. Sometimes a relevant time sequence needs an "announcement" for assistive technology, too... Either tied to a transition (for example, between slide builds where the change is more important than either end state) or a time code of the overall timelines, similar to closed captions or audio descriptions.

Potentially relevant: I'm working on a PR for VTT to add an ATTRIBUTES block, generally to disambiguate various types of metadata, but specifically because it's a prerequisite for using VTT to define time-based general flash data (seizure avoidance, etc.) in this follow-up VTT issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if you see these timeSamples keys aligning with other timed-text formats like VTT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand VTT correctly, it specifies time code ranges while OpenUSD holds an authored value until the next authored time sample and pulls from the first or last time sample when querying out of range. To describe that format in OpenUSD, you'd likely have to do something like this--

string  semantics:captions:skills.timeSamples = {
   99.9999: "" # Suppress out of range queries
   100 : "This is some state." # This is valid between time codes 100 and 150.
   150.00001: "" # Suppress out of range queries
}

This is highly speculative, but I'm curious if there's path to building something like VTT using the time series proposal as a starting point. It's currently designed for animation splines, but might provide a path for eventually describing more complicated time based value resolution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time series have actually been removed from the design for animation splines, in favor of more simply leveraging timeSamples for all non-scalar, non-floating-point varying data.

@dgovil dgovil mentioned this pull request Jul 3, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Draft
Development

Successfully merging this pull request may close these issues.

5 participants