Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-enable Azure AI Inference instrumentation in Azure Monitor, update docs #38071

Merged
merged 14 commits into from
Oct 30, 2024
2 changes: 2 additions & 0 deletions .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,8 @@
"onmicrosoft",
"openai",
"OPENAI",
"otlp",
"OTLP",
"owasp",
"ownerid",
"PBYTE",
Expand Down
59 changes: 40 additions & 19 deletions sdk/ai/azure-ai-inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ The `EmbeddingsClient` has a method named `embedding`. The method makes a REST A

See simple text embedding example below. More can be found in the [samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples) folder.

<!--
<!--
### Image Embeddings

TODO: Add overview and link to explain image embeddings.
Expand All @@ -242,7 +242,7 @@ In the following sections you will find simple examples of:
* [Text Embeddings](#text-embeddings-example)
<!-- * [Image Embeddings](#image-embeddings-example) -->

The examples create a synchronous client assuming a Serverless API or Managed Compute endpoint. Modify client
The examples create a synchronous client assuming a Serverless API or Managed Compute endpoint. Modify client
construction code as descirbed in [Key concepts](#key-concepts) to have it work with GitHub Models endpoint or Azure OpenAI
endpoint. Only mandatory input settings are shown for simplicity.

Expand Down Expand Up @@ -275,7 +275,7 @@ print(response.choices[0].message.content)

The following types or messages are supported: `SystemMessage`,`UserMessage`, `AssistantMessage`, `ToolMessage`. See also samples:

* [sample_chat_completions_with_tools.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_tools.py) for usage of `ToolMessage`.
* [sample_chat_completions_with_tools.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_tools.py) for usage of `ToolMessage`.
* [sample_chat_completions_with_image_url.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_image_url.py) for usage of `UserMessage` that
includes sending an image URL.
* [sample_chat_completions_with_image_data.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_image_data.py) for usage of `UserMessage` that
Expand Down Expand Up @@ -535,15 +535,44 @@ For more information, see [Configure logging in the Azure libraries for Python](

To report issues with the client library, or request additional features, please open a GitHub issue [here](https://github.com/Azure/azure-sdk-for-python/issues)

## Tracing
## Observability With OpenTelemetry

The Azure AI Inference client library provides experimental support for tracing with OpenTelemetry.

You can capture prompt and completion contents by setting `AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED` environment to `true` (case insensitive).
By default prompts, completions, function name, parameters or outputs are not recorded.

The Azure AI Inferencing API Tracing library provides tracing for Azure AI Inference client library for Python. Refer to Installation chapter above for installation instructions.
### Setup with Azure Monitor

### Setup
When using Azure AI Inference library with [Azure Monitor OpenTelemetry Distro](https://learn.microsoft.com/azure/azure-monitor/app/opentelemetry-enable?tabs=python),
distributed tracing for Azure AI Inference calls is enabled by default when using latest version of the distro.

lmolkova marked this conversation as resolved.
Show resolved Hide resolved
The environment variable AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED controls whether the actual message contents will be recorded in the traces or not. By default, the message contents are not recorded as part of the trace. When message content recording is disabled any function call tool related function names, function parameter names and function parameter values are also not recorded in the trace. Set the value of the environment variable to "true" (case insensitive) for the message contents to be recorded as part of the trace. Any other value will cause the message contents not to be recorded.
### Setup with OpenTelemetry

You also need to configure the tracing implementation in your code by setting `AZURE_SDK_TRACING_IMPLEMENTATION` to `opentelemetry` or configuring it in the code with the following snippet:
Check out your observability vendor documentation on how to configure OpenTelemetry or refer to the [official OpenTelemetry documentation](https://opentelemetry.io/docs/languages/python/).

#### Installation

Make sure to install OpenTelemetry and the Azure SDK tracing plugin via

```bash
pip install opentelemetry
pip install azure-core-tracing-opentelemetry
```

You will also need an exporter to send telemetry to your observability backend. You can print traces to the console or use a local viewer such as [Aspire Dashboard](https://learn.microsoft.com/dotnet/aspire/fundamentals/dashboard/standalone?tabs=bash).

To connect to Aspire Dashboard or another OpenTelemetry compatible backend, install OTLP exporter:

```bash
pip install opentelemetry-exporter-otlp
```

#### Configuration

To enable Azure SDK tracing set `AZURE_SDK_TRACING_IMPLEMENTATION` environment variable to `opentelemetry`.

Or configure it in the code with the following snippet:

<!-- SNIPPET:sample_chat_completions_with_tracing.trace_setting -->

Expand All @@ -556,16 +585,7 @@ settings.tracing_implementation = "opentelemetry"

Please refer to [azure-core-tracing-documentation](https://learn.microsoft.com/python/api/overview/azure/core-tracing-opentelemetry-readme) for more information.

### Exporting Traces with OpenTelemetry

Azure AI Inference is instrumented with OpenTelemetry. In order to enable tracing you need to configure OpenTelemetry to export traces to your observability backend.
Refer to [Azure SDK tracing in Python](https://learn.microsoft.com/python/api/overview/azure/core-tracing-opentelemetry-readme?view=azure-python-preview) for more details.

Refer to [Azure Monitor OpenTelemetry documentation](https://learn.microsoft.com/azure/azure-monitor/app/opentelemetry-enable?tabs=python) for the details on how to send Azure AI Inference traces to Azure Monitor and create Azure Monitor resource.

### Instrumentation

Use the AIInferenceInstrumentor to instrument the Azure AI Inferencing API for LLM tracing, this will cause the LLM traces to be emitted from Azure AI Inferencing API.
The final step is to enable Azure AI Inference instrumentation with the following code snippet:

<!-- SNIPPET:sample_chat_completions_with_tracing.instrument_inferencing -->

Expand All @@ -589,7 +609,8 @@ AIInferenceInstrumentor().uninstrument()
<!-- END SNIPPET -->

### Tracing Your Own Functions
The @tracer.start_as_current_span decorator can be used to trace your own functions. This will trace the function parameters and their values. You can also add further attributes to the span in the function implementation as demonstrated below. Note that you will have to setup the tracer in your code before using the decorator. More information is available [here](https://opentelemetry.io/docs/languages/python/).

The `@tracer.start_as_current_span` decorator can be used to trace your own functions. This will trace the function parameters and their values. You can also add further attributes to the span in the function implementation as demonstrated below. Note that you will have to setup the tracer in your code before using the decorator. More information is available [here](https://opentelemetry.io/docs/languages/python/).

<!-- SNIPPET:sample_chat_completions_with_tracing.trace_function -->

Expand Down
1 change: 0 additions & 1 deletion sdk/ai/azure-ai-inference/azure/ai/inference/_patch.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,6 @@ def __init__(

super().__init__(endpoint, credential, **kwargs)


@overload
def complete(
self,
Expand Down
1 change: 1 addition & 0 deletions sdk/ai/azure-ai-inference/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
-e ../../../tools/azure-sdk-tools
../../core/azure-core
../../core/azure-core-tracing-opentelemetry
../../monitor/azure-monitor-opentelemetry
aiohttp
opentelemetry-sdk
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,7 @@ async def sample_chat_completions_from_input_json_async():
"role": "assistant",
"content": "The main construction of the International Space Station (ISS) was completed between 1998 and 2011. During this period, more than 30 flights by US space shuttles and 40 by Russian rockets were conducted to transport components and modules to the station.",
},
{
"role": "user",
"content": "And what was the estimated cost to build it?"
},
{"role": "user", "content": "And what was the estimated cost to build it?"},
]
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def sample_chat_completions_azure_openai():
endpoint=endpoint,
credential=DefaultAzureCredential(exclude_interactive_browser_credential=False),
credential_scopes=["https://cognitiveservices.azure.com/.default"],
api_version="2024-06-01", # Azure OpenAI api-version. See https://aka.ms/azsdk/azure-ai-inference/azure-openai-api-versions
api_version="2024-06-01", # Azure OpenAI api-version. See https://aka.ms/azsdk/azure-ai-inference/azure-openai-api-versions
)

response = client.complete(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,7 @@ def sample_chat_completions_from_input_json():
"role": "assistant",
"content": "The main construction of the International Space Station (ISS) was completed between 1998 and 2011. During this period, more than 30 flights by US space shuttles and 40 by Russian rockets were conducted to transport components and modules to the station.",
},
{
"role": "user",
"content": "And what was the estimated cost to build it?"
},
{"role": "user", "content": "And what was the estimated cost to build it?"},
]
}
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,7 @@ def sample_chat_completions_from_input_json_with_image_url():
model_deployment = None

client = ChatCompletionsClient(
endpoint=endpoint,
credential=AzureKeyCredential(key),
headers={"azureml-model-deployment": model_deployment}
endpoint=endpoint, credential=AzureKeyCredential(key), headers={"azureml-model-deployment": model_deployment}
)

response = client.complete(
Expand All @@ -69,10 +67,7 @@ def sample_chat_completions_from_input_json_with_image_url():
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@

use_azure_openai_endpoint = True


def sample_chat_completions_streaming_with_tools():
import os
import json
Expand Down Expand Up @@ -79,11 +80,9 @@ def get_flight_info(origin_city: str, destination_city: str):
str: The airline name, fight number, date and time of the next flight between the cities, in JSON format.
"""
if origin_city == "Seattle" and destination_city == "Miami":
return json.dumps({
"airline": "Delta",
"flight_number": "DL123",
"flight_date": "May 7th, 2024",
"flight_time": "10:00AM"})
return json.dumps(
{"airline": "Delta", "flight_number": "DL123", "flight_date": "May 7th, 2024", "flight_time": "10:00AM"}
)
return json.dumps({"error": "No flights found between the cities"})

# Define a function 'tool' that the model can use to retrieves flight information
Expand Down Expand Up @@ -117,21 +116,15 @@ def get_flight_info(origin_city: str, destination_city: str):
)
else:
# Create a chat completions client for Serverless API endpoint or Managed Compute endpoint
client = ChatCompletionsClient(
endpoint=endpoint,
credential=AzureKeyCredential(key)
)
client = ChatCompletionsClient(endpoint=endpoint, credential=AzureKeyCredential(key))

# Make a streaming chat completions call asking for flight information, while providing a tool to handle the request
messages = [
SystemMessage(content="You an assistant that helps users find flight information."),
UserMessage(content="What is the next flights from Seattle to Miami?"),
]

response = client.complete(
messages=messages,
tools=[flight_info],
stream=True)
response = client.complete(messages=messages, tools=[flight_info], stream=True)

# Note that in the above call we did not specify `tool_choice`. The service defaults to a setting equivalent
# to specifying `tool_choice=ChatCompletionsToolChoicePreset.AUTO`. Other than ChatCompletionsToolChoicePreset
Expand All @@ -158,11 +151,7 @@ def get_flight_info(origin_city: str, destination_city: str):
AssistantMessage(
tool_calls=[
ChatCompletionsToolCall(
id=tool_call_id,
function=FunctionCall(
name=function_name,
arguments=function_args
)
id=tool_call_id, function=FunctionCall(name=function_name, arguments=function_args)
)
]
)
Expand All @@ -176,19 +165,10 @@ def get_flight_info(origin_city: str, destination_city: str):
print(f"Function response = {function_response}")

# Append the function response as a tool message to the chat history
messages.append(
ToolMessage(
tool_call_id=tool_call_id,
content=function_response
)
)
messages.append(ToolMessage(tool_call_id=tool_call_id, content=function_response))

# With the additional tools information on hand, get another streaming response from the model
response = client.complete(
messages=messages,
tools=[flight_info],
stream=True
)
response = client.complete(messages=messages, tools=[flight_info], stream=True)

print("Model response = ", end="")
for update in response:
Expand Down
Loading