If the structured data contains nested lists, the response time increases exponentially as the size of these lists grows #277

ishswar · 2024-12-16T18:18:16Z

Summary:
We have noticed a significant delay in the response time when generating structured weather data with a List[DailyForecast] and List[HourlyForecast]. The time taken for the request grows almost exponentially as the number of forecast days increases.

This issue arises when generating structured data, where WeatherData contains a forecasts field with a list of DailyForecast objects, and each DailyForecast contains a list of HourlyForecast objects.

Issue Description:

When running the code to fetch weather data, the number_of_days (which controls how many days of forecasts are requested) directly impacts the time taken to generate the structured data. As the number of days increases, the time taken grows exponentially. Here is the summary of iterations:

number_of_days	time_taken
1	7.96 sec
2	10.74 sec
3	18.33 sec
4	28.58 sec

Full Sample Output:


2024-12-16 09:48:25,697 - PydanticAI version: 0.0.12
2024-12-16 09:48:25,697 - OpenAI version: 1.57.2
2024-12-16 09:48:25,697 - 
Running the agent for 1 day(s) of forecast :

LogfireNotConfiguredWarning:

No logs or spans will be created until `logfire.configure()` has been called. Set the environment variable LOGFIRE_IGNORE_NO_CONFIG=1 or add ignore_no_config=true in pyproject.toml to suppress this warning.

2024-12-16 09:48:26,823 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-16 09:48:26,841 - Getting lat and lng for Buffalo and Country US
2024-12-16 09:48:28,332 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-16 09:48:28,337 - Getting weather for Buffalo in United States with temperature unit C
2024-12-16 09:48:28,337 - Thoughts: User is asking for current weather
2024-12-16 09:48:33,649 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-16 09:48:33,654 - Time taken for 1 day(s): 7.96 seconds
2024-12-16 09:48:33,654 - Response: {
  "City": "Buffalo",
  "Current Temperature": "8.1\u00b0 C",
  "Forecast Days": 1
}
2024-12-16 09:48:33,654 - 
Running the agent for 2 day(s) of forcast :
2024-12-16 09:48:34,656 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-16 09:48:34,659 - Getting lat and lng for Buffalo and Country US
2024-12-16 09:48:38,342 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-16 09:48:38,346 - Getting weather for Buffalo in United States with temperature unit C
2024-12-16 09:48:38,346 - Thoughts: Please provide the current weather information.
2024-12-16 09:48:44,385 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-16 09:48:44,390 - Time taken for 2 day(s): 10.74 seconds
2024-12-16 09:48:44,391 - Response: {
  "City": "Buffalo",
  "Current Temperature": "30.2\u00b0 C",
  "Forecast Days": 2
}
2024-12-16 09:48:44,391 - 
Running the agent for 3 day(s) of forcast :
2024-12-16 09:48:45,407 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-16 09:48:45,411 - Getting lat and lng for Buffalo and Country US
2024-12-16 09:48:47,093 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-16 09:48:47,103 - Getting weather for Buffalo in United States with temperature unit C
2024-12-16 09:48:47,103 - Thoughts: Please provide the latest weather data for Buffalo.
2024-12-16 09:49:02,716 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-16 09:49:02,722 - Time taken for 3 day(s): 18.33 seconds
2024-12-16 09:49:02,722 - Response: {
  "City": "Buffalo",
  "Current Temperature": "28.7\u00b0 C",
  "Forecast Days": 3
}
2024-12-16 09:49:02,722 - 
Running the agent for 4 day(s) of forcast :
2024-12-16 09:49:03,815 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-16 09:49:03,819 - Getting lat and lng for Buffalo and Country US
2024-12-16 09:49:05,687 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-16 09:49:05,692 - Getting weather for Buffalo in United States with temperature unit C
2024-12-16 09:49:05,692 - Thoughts: What's the current weather?
2024-12-16 09:49:31,289 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-16 09:49:31,300 - Time taken for 4 day(s): 28.58 seconds
2024-12-16 09:49:31,300 - Response: {
  "City": "Buffalo",
  "Current Temperature": "20.6\u00b0 C",
  "Forecast Days": 4
}

Summary of iterations:
   number_of_days  time_taken
0               1    7.956691
1               2   10.736448
2               3   18.330979
3               4   28.577719

Process finished with exit code 0

Possible Causes:

We are unsure whether the delay is due to OpenAI’s structured data or the tool calling process. However, we suspect that OpenAI's handling of larger structured data may be the bottleneck. As the data size grows, particularly with nested lists like HourlyForecast within DailyForecast, the response time increases dramatically.

Expected Behavior:

The agent should not experience exponential delays when the number_of_days increases. The response time should remain consistent, even if the data complexity increases due to nested structures.

Environment Details:

Pydantic AI version: 0.0.12
OpenAI version: 1.57.2
Python version: 3.10.x
OS: macOS

Code:

from __future__ import annotations as _annotations

import asyncio
import json
import logging
import random
import time
import uuid
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import Any, List, Optional, Literal

import dotenv
import openai
import pandas as pd
import pydantic_ai
from httpx import AsyncClient
from pydantic import BaseModel
from pydantic import Field
from pydantic_ai import Agent, ModelRetry

# Set up logging configuration
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(message)s',  # This adds the timestamp to each log message
    handlers=[logging.StreamHandler()]  # This ensures logs are shown in the console
)

# Example usage
logger = logging.getLogger(__name__)

# load env
dotenv.load_dotenv()

# 'if-token-present' means nothing will be sent (and the example will work) if you don't have logfire configured
# logfire.configure()

class HourlyForecast(BaseModel):
    time: str  # Formatted time (e.g., "12:00 PM")
    temperature: float  # Temperature in the chosen unit
    description: str  # Weather description
    emoji: str  # Emoji representing the weather kind

class DailyForecast(BaseModel):
    date: str  # Formatted date (e.g., "Monday, December 11, 2024")
    average_temperature: float  # Average temperature for the day
    hourly_forecasts: List[HourlyForecast]  # List of hourly forecasts



class WeatherData(BaseModel):
    city: str = Field(..., description="City for which the weather data is retrieved")
    current_temperature: float = Field(..., description="Current temperature in the chosen unit")
    temp_unit: str = Field(..., description="Temperature unit (C or F)")
    latitude: Optional[float] = Field(None, description="Latitude of the city")
    longitude: Optional[float] = Field(None, description="Longitude of the city")
    forecasts: List[DailyForecast] = Field(..., description="List of daily forecasts")
    country: Optional[str] = Field(None, description="Country of the city")
    region: Optional[str] = Field(None, description="Region of the city")
    unknown_city: Optional[bool] = Field(None, description="True if the city is unknown")
    reason: Optional[str] = Field(None, description="Reason for the unknown city")
    country_emoji: Optional[str] = Field(None, description="Emoji representing the country's flag")

    def model_dump_json(self, indent=2):
        # Prepare a minimal summary of the data
        summary = {
            "City": self.city,
            "Current Temperature": f"{self.current_temperature}° {self.temp_unit}",
            "Forecast Days": len(self.forecasts) if self.forecasts else 0,
        }

        # Convert the summary to a JSON string with indentation
        return json.dumps(summary, indent=indent)

@dataclass
class Deps:
    client: AsyncClient
    conversation_id: uuid.UUID = None
    number_of_days: int = 3

weather_agent = Agent(
    'openai:gpt-4o-mini',
system_prompt=(
    "For example: if the user asks 'Is it cloudy in Sydney?', and it is or will be cloudy, respond with 'Yes, it's cloudy'; otherwise, say 'No'. "
    "IMPORTANT: If the user is asking for the weather in a specific city or country,  "
    "If the user provides just a city name without a specific question,  "
    "NEVER use your own knowledge of the world weather data as it might be outdated, and it was collected during your training. "
    "Always use the provided tools to fetch the latest weather data. DO NOT rely on your internal knowledge for weather-related responses; always use the tools provided to fetch the most up-to-date information."
    )
    ,
    deps_type=Deps,
    retries=2,
    result_type=WeatherData,
    result_retries=2,
)


async def get_weather_call(city_name: str, country_name: str | None, temp_unit_c_or_f: str , number_of_days : int ) -> WeatherData:
    """Get the weather information for a given city."""
    forecasts = []
    current_date = datetime.now()
    for _ in range(number_of_days):
        daily_forecast = DailyForecast(
            date=current_date.strftime("%A, %B %d, %Y"),
            average_temperature=round(random.uniform(-10, 35), 1),
            hourly_forecasts=[
                HourlyForecast(
                    time=(current_date + timedelta(hours=i)).strftime("%I:%M %p"),
                    temperature=round(random.uniform(-10, 35), 1),
                    description=random.choice(["Sunny", "Cloudy", "Rainy", "Snowy"]),
                    emoji=random.choice(["☀️", "☁️", "🌧️", "❄️"])
                )
                for i in range(8)
            ]
        )
        forecasts.append(daily_forecast)
        current_date += timedelta(days=1)

    weather_ret_data = WeatherData(
        city=city_name,
        current_temperature=round(random.uniform(-10, 35), 1),
        temp_unit=temp_unit_c_or_f,
        forecasts=forecasts,
        country=country_name,
        region="New York",
        country_emoji="🇺🇸"
    )
    return weather_ret_data

@weather_agent.tool_plain
async def get_lat_lng(city_name: str , iso_3166_alpha_2_country_name: str | None) -> dict[str, Any]:
    """Get the latitude and longitude of a given city."""
    logger.info(f"Getting lat and lng for {city_name} and Country {iso_3166_alpha_2_country_name}")
    try:
        # fake lat and lang
        lat = random.uniform(-90, 90)
        lng = random.uniform(-180, 180)
        return {'lat' : lat, 'lang': lng}
    except Exception as e:
        logger.error(f'Error getting lat and lng: {e}', exc_info=True)
        raise ModelRetry('Could not find the location')

# temperature_unit_celsius_or_fahrenheit = 'C' or 'F'

@weather_agent.tool
async def getweather(
    ctx: pydantic_ai.RunContext[Deps],
    city_name: str,
    country_name: str | None,
    temperature_unit_celsius_or_fahrenheit: Literal['C', 'F'],
    latitude: str,
    longitude: str,
    thoughts: str
) -> str:
    """Get the weather data for a given city."""
    user_asked_for_days = ctx.deps.number_of_days
    logger.info(f"Getting weather for {city_name} in {country_name} with temperature unit {temperature_unit_celsius_or_fahrenheit}")
    # print thoughts
    logger.info(f"Thoughts: {thoughts}")
    # Await the cached result of get_weather_call
    weather = await get_weather_call(city_name, country_name, temperature_unit_celsius_or_fahrenheit, user_asked_for_days)


    ret_data = WeatherData(
        city=city_name,
        forecasts=weather.forecasts,
        current_temperature=weather.current_temperature,
        temp_unit=temperature_unit_celsius_or_fahrenheit,
        country=weather.country,
        region=weather.region,
        latitude=float(latitude),
        longitude=float(longitude),
    )
    #    debug(ret_data)
    return ret_data


async def main():
    async with AsyncClient() as client:
        # Print pydanticai and OpneAI version
        logger.info(f"PydanticAI version: {pydantic_ai.__version__}")
        logger.info(f"OpenAI version: {openai.__version__}")
        dotenv.load_dotenv()
        # List to store the data for each iteration
        results = []

        # Loop to run the agent for 4 different number_of_days values
        for number_of_days in range(1, 5):
            logger.info(f"\nRunning the agent for {number_of_days} day(s) of forcast :")
            conversation_id = uuid.uuid4()
            deps = Deps(
                client=client,
                conversation_id=conversation_id,
                number_of_days=number_of_days
            )

            start_time = time.time()

            # Run the agent to get the weather data
            result = await weather_agent.run(
                'What is the weather like in Buffalo?', deps=deps
            )

            end_time = time.time()

            # Calculate time taken and store the iteration data
            time_taken = end_time - start_time
            results.append({'number_of_days': number_of_days, 'time_taken': time_taken})

            # Print the result (Weather Data) from the agent
            logger.info(f"Time taken for {number_of_days} day(s): {time_taken:.2f} seconds")
            logger.info('Response: %s', result.data.model_dump_json(indent=2))

        # Convert the results list to a pandas DataFrame for a nice table
        df = pd.DataFrame(results)

        # Display the table at the end
        print("\nSummary of iterations:")
        print(df)



if __name__ == '__main__':
    asyncio.run(main())

The text was updated successfully, but these errors were encountered:

samuelcolvin · 2024-12-16T21:41:34Z

can you show some timings on what is slow? Either manually or a screenshot of a logfire trace.

If it's just OpenAI that's slow, there's not much PydanticAI can do about that.

ishswar · 2024-12-18T04:10:25Z

@samuelcolvin,

As highlighted in the [OpenAI article on Structured Outputs](https://openai.com/index/introducing-structured-outputs-in-the-api/), it’s mentioned that the first API response with a new schema can take up to 10 seconds. In my use case, where the model is called twice to build a nested structure, this results in a latency of over 20 seconds per request, which is not ideal for real-time applications.

This latency seems to be a known issue, but for anyone building serious applications with Structured Responses in LLM, this delay can be a significant bottleneck.

How do you envision handling this latency with valid structured data for real-time use cases? Do you expect this delay to improve over time, or should we reconsider using structured data in scenarios that require fast responses?

Additional Info:

Sfstructured data in performance-critical applications, and any guidance on improving this would be greatly appreciated.

https://openai.com/index/introducing-structured-outputs-in-the-api/

Limitations and restrictions

There are a few limitations to keep in mind when using Structured Outputs:

Structured Outputs allows only a subset of JSON Schema, detailed in our docs⁠(opens in a new window). This helps us ensure the best possible performance.
The first API response with a new schema will incur additional latency, but subsequent responses will be fast with no latency penalty. This is because during the first request, we process the schema as indicated above and then cache these artifacts for fast reuse later on. Typical schemas take under 10 seconds to process on the first request, but more complex schemas may take up to a minute.

ishswar · 2024-12-18T04:22:38Z

@samuelcolvin - Here is the log fire output you wanted - I just ran to 3 day forecast ( list of 3 elements with hourly data )

2024-12-17 20:15:40,900 - PydanticAI version: 0.0.13
2024-12-17 20:15:40,900 - OpenAI version: 1.57.2
2024-12-17 20:15:40,900 - 
Running the agent for 3 day(s) of forcast :
04:15:40.901 weather_agent run prompt=What is the weather like in Buffalo?
04:15:40.901   preparing model and tools run_step=1
04:15:40.901   model request
2024-12-17 20:15:48,387 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-17 20:15:48,403 - Getting lat and lng for Buffalo and Country US
04:15:48.402   handle model response
04:15:48.402     running tools=['get_lat_lng']
04:15:48.403   preparing model and tools run_step=2
04:15:48.404   model request
2024-12-17 20:15:49,727 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-12-17 20:15:49,733 - Getting weather for Buffalo in United States with temperature unit C
2024-12-17 20:15:49,733 - Thoughts: Fetching current weather data for Buffalo.
04:15:49.731   handle model response
04:15:49.732     running tools=['getweather']
04:15:49.736   preparing model and tools run_step=3
04:15:49.737   model request
2024-12-17 20:16:04,649 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
04:16:04.667   handle model response
04:16:04.671   preparing model and tools run_step=4
04:16:04.672   model request
2024-12-17 20:16:22,247 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
04:16:22.252   handle model response
2024-12-17 20:16:22,261 - Time taken for 3 day(s): 41.36 seconds
2024-12-17 20:16:22,261 - Response: {
  "City": "Buffalo",
  "Current Temperature": "26.4\u00b0 C",
  "Forecast Days": 3
}

Summary of iterations:
   number_of_days  time_taken
0               3   41.360618

Also, the logfire console output is below.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If the structured data contains nested lists, the response time increases exponentially as the size of these lists grows #277

If the structured data contains nested lists, the response time increases exponentially as the size of these lists grows #277

ishswar commented Dec 16, 2024

samuelcolvin commented Dec 16, 2024

ishswar commented Dec 18, 2024

ishswar commented Dec 18, 2024

If the structured data contains nested lists, the response time increases exponentially as the size of these lists grows #277

If the structured data contains nested lists, the response time increases exponentially as the size of these lists grows #277

Comments

ishswar commented Dec 16, 2024

Issue Description:

Full Sample Output:

Possible Causes:

Expected Behavior:

Environment Details:

Code:

samuelcolvin commented Dec 16, 2024

ishswar commented Dec 18, 2024

Limitations and restrictions

ishswar commented Dec 18, 2024