Updated Specification and Documentation to support Audio Modality. #93

evalstate · 2024-12-01T12:09:48Z

This change supports discussion #88 and includes Audio Modality in the specification.

Motivation and Context

This would enable integration with models that support Audio as input/output in context such as gpt-4o-audio-preview.
https://platform.openai.com/docs/guides/audio

How Has This Been Tested?

This has been tested using the Inspector tool, with local type extensions:

// Define the AudioContent schema
export const AudioContentSchema = z.object({
  type: z.literal("audio"),
  data: z.string().base64(),
  mimeType: z.string(),
}).passthrough();

// Extend the CallToolResult schema to include audio content
export const ExtendedCallToolResultSchema = ResultSchema.extend({
  content: z.array(
    z.discriminatedUnion("type", [
      TextContentSchema,
      ImageContentSchema,
      AudioContentSchema,
      EmbeddedResourceSchema,
    ])
  ),
  isError: z.boolean().default(false).optional(),
});

// Export the types
export type AudioContent = z.infer<typeof AudioContentSchema>;
export type ExtendedCallToolResult = z.infer<typeof ExtendedCallToolResultSchema>;

The Inspector application was updated to render the Audio player for this type:

              {item.type === "image" && (
                <img
                  src={`data:${item.mimeType};base64,${item.data}`}
                  ...
              {item.type === "audio" && (
                  <audio
                  controls
                  src={`data:${item.mimeType};base64,${item.data}`}

The Server produced this JSON:

      const audioBase64 = await generateSpeech(text);
      return {
        content: [{
          type: "audio",
            mimeType: "audio/wav",
            data: audioBase64
        }]
      };
    }

I was unable to find the process to build the TypeScript SDK from the Schema, hence the approach of extending types.

Ultimately I would like to integrate this in to my Chat application supporting gpt-4o (and potential new models) with Audio support.

Breaking Changes

No. However:

The Client Reference Implementation (Claude Desktop) does not support audio.
The SDKs will require updating to include the extended type.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

I believe that adding the "Audio" type is appropriate as it seems congruent the way that text/image modalities are typically handled.

modalities: ["text", "audio"],

jspahrsummers

Thank you! This makes sense to me, and seems like a clean extension to the protocol.

@dsp-ant Any thoughts?

docs/specification/client/sampling.md

jspahrsummers · 2024-12-02T13:24:08Z

We probably want to rev the protocol version, since this would be a new type that receivers may not be expecting.

Co-authored-by: Justin Spahr-Summers <[email protected]>

dsp-ant · 2024-12-02T16:12:33Z

This seems reasonable to me. I am not accepting mostly so we don't accidentally merge this. We need to first rev the protocol and add ways to handle revisions in the current protocol.

evalstate · 2024-12-02T17:55:36Z

This seems reasonable to me. I am not accepting mostly so we don't accidentally merge this. We need to first rev the protocol and add ways to handle revisions in the current protocol.

@dsp-ant - I couldn't find the tool/script to generate new versions of the SDKs from the spec for testing - are they available?

jspahrsummers · 2025-01-10T10:44:40Z

We've now created a separate place for the draft version of the spec. Can you please move this there?

I couldn't find the tool/script to generate new versions of the SDKs from the spec for testing - are they available?

We use Claude to update the SDKs in response to spec changes—e.g., by giving it the current SDK interfaces and a diff of what changed in the schema.

Update validate and build scripts to build drafts too.

evalstate · 2025-01-15T08:57:59Z

As requested, updated to point changes to draft version of spec.

@dsp-ant - i decided to create /schema/draft/schema.ts for the changes. the validate and build scripts have been updated to handle both main and draft schemas.

jspahrsummers

Thanks for doing this! I left a couple of comments on the doc page which should be fixed, but then also a couple of questions for @dsp-ant about how we want to do versioning—he's been leading the charge on the release vs. draft spec 🙂

docs/specification/draft/client/sampling.md

schema/draft/schema.ts

evalstate · 2025-01-15T11:37:09Z

I've updated the link in the client/sampling.md to be a relative link - should make maintaining draft/versions easier?

evalstate

Updated link to be relative - should make moving between versions easier.

dsp-ant

I think the nits we can work out later. Happy with that. Thank you.

docs/specification/draft/client/sampling.md

Updated Specification and Docs to support Audio Modality.

424f280

jspahrsummers previously approved these changes Dec 2, 2024

View reviewed changes

docs/specification/client/sampling.md Outdated Show resolved Hide resolved

jspahrsummers added this to the After: 2024-11-05 milestone Dec 2, 2024

evalstate dismissed jspahrsummers’s stale review via 02f6f84 December 2, 2024 13:34

evalstate and others added 2 commits December 2, 2024 13:34

Update docs/specification/client/sampling.md

02f6f84

Co-authored-by: Justin Spahr-Summers <[email protected]>

Merge branch 'main' into feature/audio-modality

a826742

Merge branch 'main' into feature/audio-modality

41c60ce

evalstate added 3 commits January 15, 2025 09:48

Revert schema/schema.ts to upstream/main, add drafts folder.

2f2c60d

Update validate and build scripts to build drafts too.

Merge remote-tracking branch 'upstream/main' into feature/audio-modality

e16629e

point documentation changes to "draft"

94bc438

jspahrsummers requested changes Jan 15, 2025

View reviewed changes

docs/specification/draft/client/sampling.md Outdated Show resolved Hide resolved

docs/specification/draft/client/sampling.md Outdated Show resolved Hide resolved

schema/draft/schema.ts Show resolved Hide resolved

schema/draft/schema.ts Outdated Show resolved Hide resolved

change to relative link in spec

b89647d

evalstate commented Jan 15, 2025

View reviewed changes

dsp-ant requested review from jspahrsummers and dsp-ant January 15, 2025 14:29

updated schema protocol version to reflect draft.

f315068

dsp-ant previously approved these changes Jan 16, 2025

View reviewed changes

docs/specification/draft/client/sampling.md Outdated Show resolved Hide resolved

update protocol revision to draft on sampling page

9f16ffd

evalstate dismissed dsp-ant’s stale review via 9f16ffd January 16, 2025 20:28

jspahrsummers approved these changes Jan 17, 2025

View reviewed changes

dsp-ant self-requested a review January 17, 2025 18:44

dsp-ant approved these changes Jan 17, 2025

View reviewed changes

dsp-ant merged commit 51feed1 into modelcontextprotocol:main Jan 17, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated Specification and Documentation to support Audio Modality. #93

Updated Specification and Documentation to support Audio Modality. #93

evalstate commented Dec 1, 2024

jspahrsummers left a comment

jspahrsummers commented Dec 2, 2024 •

edited

Loading

dsp-ant commented Dec 2, 2024

evalstate commented Dec 2, 2024

jspahrsummers commented Jan 10, 2025

evalstate commented Jan 15, 2025

jspahrsummers left a comment

evalstate commented Jan 15, 2025

evalstate left a comment

dsp-ant left a comment

Updated Specification and Documentation to support Audio Modality. #93

Updated Specification and Documentation to support Audio Modality. #93

Conversation

evalstate commented Dec 1, 2024

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

jspahrsummers left a comment

Choose a reason for hiding this comment

jspahrsummers commented Dec 2, 2024 • edited Loading

dsp-ant commented Dec 2, 2024

evalstate commented Dec 2, 2024

jspahrsummers commented Jan 10, 2025

evalstate commented Jan 15, 2025

jspahrsummers left a comment

Choose a reason for hiding this comment

evalstate commented Jan 15, 2025

evalstate left a comment

Choose a reason for hiding this comment

dsp-ant left a comment

Choose a reason for hiding this comment

jspahrsummers commented Dec 2, 2024 •

edited

Loading