Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Specification and Documentation to support Audio Modality. #93

Merged
merged 10 commits into from
Jan 17, 2025
16 changes: 13 additions & 3 deletions docs/specification/draft/client/sampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ weight: 40
---

{{< callout type="info" >}}
**Protocol Revision**: draft
**Protocol Revision**: {{< param protocolRevision >}}
evalstate marked this conversation as resolved.
Show resolved Hide resolved
evalstate marked this conversation as resolved.
Show resolved Hide resolved
{{< /callout >}}

The Model Context Protocol (MCP) provides a standardized way for servers to request LLM sampling ("completions" or "generations") from language models via clients. This flow allows clients to maintain control over model access, selection, and permissions while enabling servers to leverage AI capabilities&mdash;with no server API keys necessary. Servers can request text or image-based interactions and optionally include context from MCP servers in their prompts.
The Model Context Protocol (MCP) provides a standardized way for servers to request LLM sampling ("completions" or "generations") from language models via clients. This flow allows clients to maintain control over model access, selection, and permissions while enabling servers to leverage AI capabilities&mdash;with no server API keys necessary. Servers can request text, audio, or image-based interactions and optionally include context from MCP servers in their prompts.

## User Interaction Model

Expand All @@ -27,7 +27,7 @@ Implementations are free to expose sampling through any interface pattern that s

## Capabilities

Clients that support sampling **MUST** declare the `sampling` capability during [initialization]({{< ref "/specification/draft/basic/lifecycle#initialization" >}}):
Clients that support sampling **MUST** declare the `sampling` capability during [initialization]({{< ref "/specification/2024-11-05/basic/lifecycle#initialization" >}}):
evalstate marked this conversation as resolved.
Show resolved Hide resolved

```json
{
Expand Down Expand Up @@ -142,6 +142,16 @@ Sampling messages can contain:
}
```

#### Audio Content
```json
{
"type": "audio",
"data": "base64-encoded-audio-data",
"mimeType": "audio/wav"
}
```


### Model Preferences

Model selection in MCP requires careful abstraction since servers and clients may use different AI providers with distinct model offerings. A server cannot simply request a specific model by name since the client may not have access to that exact model or may prefer to use a different provider's equivalent model.
Expand Down
11 changes: 11 additions & 0 deletions docs/specification/draft/server/prompts.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,17 @@ Image content allows including visual information in messages:
```
The image data **MUST** be base64-encoded and include a valid MIME type. This enables multi-modal interactions where visual context is important.

#### Audio Content
Audio content allows including audio information in messages:
```json
{
"type": "audio",
"data": "base64-encoded-audio-data",
"mimeType": "audio/wav"
}
```
The audio data MUST be base64-encoded and include a valid MIME type. This enables multi-modal interactions where audio context is important.

#### Embedded Resources
Embedded resources allow referencing server-side resources directly in messages:
```json
Expand Down
9 changes: 9 additions & 0 deletions docs/specification/draft/server/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,15 @@ Tool results can contain multiple content items of different types:
}
```

#### Audio Content
```json
{
"type": "audio",
"data": "base64-encoded-audio-data",
"mimeType": "audio/wav"
}
```

#### Embedded Resources

[Resources]({{< ref "/specification/draft/server/resources" >}}) **MAY** be embedded, to provide additional context or data, behind a URI that can be subscribed to or fetched again by the client later:
Expand Down
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
"node": ">=20"
},
"scripts": {
"validate:schema": "tsc --noEmit schema/schema.ts",
"generate:json": "typescript-json-schema --defaultNumberType integer --required schema/schema.ts \"*\" -o schema/schema.json",
"validate:schema": "tsc --noEmit schema/schema.ts schema/draft/schema.ts",
"generate:json": "typescript-json-schema --defaultNumberType integer --required schema/schema.ts \"*\" -o schema/schema.json && typescript-json-schema --defaultNumberType integer --required schema/draft/schema.ts \"*\" -o schema/draft/schema.json",
"serve:docs": "hugo --source site/ server --logLevel debug --disableFastRender"
},
"devDependencies": {
Expand Down
Loading