Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Specification and Documentation to support Audio Modality. #93

Merged
merged 10 commits into from
Jan 17, 2025

Conversation

evalstate
Copy link
Contributor

This change supports discussion #88 and includes Audio Modality in the specification.

Motivation and Context

This would enable integration with models that support Audio as input/output in context such as gpt-4o-audio-preview.
https://platform.openai.com/docs/guides/audio

How Has This Been Tested?

This has been tested using the Inspector tool, with local type extensions:

// Define the AudioContent schema
export const AudioContentSchema = z.object({
  type: z.literal("audio"),
  data: z.string().base64(),
  mimeType: z.string(),
}).passthrough();

// Extend the CallToolResult schema to include audio content
export const ExtendedCallToolResultSchema = ResultSchema.extend({
  content: z.array(
    z.discriminatedUnion("type", [
      TextContentSchema,
      ImageContentSchema,
      AudioContentSchema,
      EmbeddedResourceSchema,
    ])
  ),
  isError: z.boolean().default(false).optional(),
});

// Export the types
export type AudioContent = z.infer<typeof AudioContentSchema>;
export type ExtendedCallToolResult = z.infer<typeof ExtendedCallToolResultSchema>;

The Inspector application was updated to render the Audio player for this type:

              {item.type === "image" && (
                <img
                  src={`data:${item.mimeType};base64,${item.data}`}
                  ...
              {item.type === "audio" && (
                  <audio
                  controls
                  src={`data:${item.mimeType};base64,${item.data}`}

image

The Server produced this JSON:

      const audioBase64 = await generateSpeech(text);
      return {
        content: [{
          type: "audio",
            mimeType: "audio/wav",
            data: audioBase64
        }]
      };
    }

I was unable to find the process to build the TypeScript SDK from the Schema, hence the approach of extending types.

Ultimately I would like to integrate this in to my Chat application supporting gpt-4o (and potential new models) with Audio support.

Breaking Changes

No. However:

  1. The Client Reference Implementation (Claude Desktop) does not support audio.
  2. The SDKs will require updating to include the extended type.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

I believe that adding the "Audio" type is appropriate as it seems congruent the way that text/image modalities are typically handled.

modalities: ["text", "audio"],

jspahrsummers
jspahrsummers previously approved these changes Dec 2, 2024
Copy link
Member

@jspahrsummers jspahrsummers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This makes sense to me, and seems like a clean extension to the protocol.

@dsp-ant Any thoughts?

docs/specification/client/sampling.md Outdated Show resolved Hide resolved
@jspahrsummers
Copy link
Member

jspahrsummers commented Dec 2, 2024

We probably want to rev the protocol version, since this would be a new type that receivers may not be expecting.

@dsp-ant
Copy link
Member

dsp-ant commented Dec 2, 2024

This seems reasonable to me. I am not accepting mostly so we don't accidentally merge this. We need to first rev the protocol and add ways to handle revisions in the current protocol.

@evalstate
Copy link
Contributor Author

This seems reasonable to me. I am not accepting mostly so we don't accidentally merge this. We need to first rev the protocol and add ways to handle revisions in the current protocol.

@dsp-ant - I couldn't find the tool/script to generate new versions of the SDKs from the spec for testing - are they available?

@jspahrsummers
Copy link
Member

We've now created a separate place for the draft version of the spec. Can you please move this there?

I couldn't find the tool/script to generate new versions of the SDKs from the spec for testing - are they available?

We use Claude to update the SDKs in response to spec changes—e.g., by giving it the current SDK interfaces and a diff of what changed in the schema.

@evalstate
Copy link
Contributor Author

As requested, updated to point changes to draft version of spec.

@dsp-ant - i decided to create /schema/draft/schema.ts for the changes. the validate and build scripts have been updated to handle both main and draft schemas.

Copy link
Member

@jspahrsummers jspahrsummers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! I left a couple of comments on the doc page which should be fixed, but then also a couple of questions for @dsp-ant about how we want to do versioning—he's been leading the charge on the release vs. draft spec 🙂

docs/specification/draft/client/sampling.md Outdated Show resolved Hide resolved
docs/specification/draft/client/sampling.md Outdated Show resolved Hide resolved
schema/draft/schema.ts Show resolved Hide resolved
schema/draft/schema.ts Outdated Show resolved Hide resolved
@evalstate
Copy link
Contributor Author

I've updated the link in the client/sampling.md to be a relative link - should make maintaining draft/versions easier?

Copy link
Contributor Author

@evalstate evalstate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated link to be relative - should make moving between versions easier.

dsp-ant
dsp-ant previously approved these changes Jan 16, 2025
Copy link
Member

@dsp-ant dsp-ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the nits we can work out later. Happy with that. Thank you.

docs/specification/draft/client/sampling.md Outdated Show resolved Hide resolved
@dsp-ant dsp-ant self-requested a review January 17, 2025 18:44
@dsp-ant dsp-ant merged commit 51feed1 into modelcontextprotocol:main Jan 17, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants