diff --git a/docs/guides/onboarding_checklist/add_metrics.md b/docs/guides/onboarding_checklist/add_metrics.md index 5c99975e2..d2fae589a 100644 --- a/docs/guides/onboarding_checklist/add_metrics.md +++ b/docs/guides/onboarding_checklist/add_metrics.md @@ -1,6 +1,16 @@ **Pydantic Logfire** can be used to collect metrics from your application and send them to a metrics backend. -Let's see how to create, and use metrics in your application. +Metrics are a great way to record numerical values where you want to see an aggregation of the data (e.g. over time), +rather than the individual values. + +## System Metrics + +The easiest way to start using metrics is to enable system metrics. +See the [System Metrics][system-metrics] documentation to learn more. + +## Manual Metrics + +Let's see how to create and use custom metrics in your application. ```py import logfire @@ -13,11 +23,6 @@ def send_message(): messages_sent.add(1) ``` -## Metric Types - -Metrics are a great way to record number values where you want to see an aggregation of the data (e.g. over time), -rather than the individual values. - ### Counter The Counter metric is particularly useful when you want to measure the frequency or occurrence of a certain @@ -250,18 +255,6 @@ logfire.metric_up_down_counter_callback( You can read more about the Up-Down Counter metric in the [OpenTelemetry documentation][up-down-counter-callback-metric]. -## System Metrics - -By default, **Logfire** does not collect system metrics. - -To enable metrics, you need just need install the `logfire[system-metrics]` extra: - -{{ install_logfire(extras=['system-metrics']) }} - -**Logfire** will automatically collect system metrics if the `logfire[system-metrics]` extra is installed. - -To know more about which system metrics are collected, check the [System Metrics][system-metrics] documentation. - [counter-metric]: https://opentelemetry.io/docs/specs/otel/metrics/api/#counter [histogram-metric]: https://opentelemetry.io/docs/specs/otel/metrics/api/#histogram [up-down-counter-metric]: https://opentelemetry.io/docs/specs/otel/metrics/api/#updowncounter diff --git a/docs/guides/web_ui/dashboards.md b/docs/guides/web_ui/dashboards.md index cb7c7f730..6b11f54f5 100644 --- a/docs/guides/web_ui/dashboards.md +++ b/docs/guides/web_ui/dashboards.md @@ -19,14 +19,20 @@ This dashboard offers a high-level view of your web services' well-being. It lik * **Percent of 5XX Requests:** Percentage of requests that resulted in server errors (status codes in the 500 range). * **Log Type Ratio**: Breakdown of the different log types generated by your web service (e.g., info, warning, error). -## System Metrics +## Basic System Metrics -This dashboard focuses on system resource utilization, potentially including: +This dashboard shows essential system resource utilization metrics. It comes in two variants: + +- **Basic System Metrics (Logfire):** Uses the data exported by [`logfire.instrument_system_metrics()`](../../integrations/system_metrics.md). +- **Basic System Metrics (OpenTelemetry):** Uses data exported by any OpenTelemetry-based instrumentation following the standard semantic conventions. + +Both variants include the following metrics: -* **CPU Usage:** Percentage of processing power utilized by the system. -* **Memory Usage:** Amount of memory currently in use by the system. * **Number of Processes:** Total number of running processes on the system. -* **Swap Usage:** Amount of swap space currently in use by the system. +* **System CPU usage %:** Percentage of total available processing power utilized by the whole system, i.e. the average across all CPU cores. +* **Process CPU usage %:** CPU used by a single process, where e.g. using 2 CPU cores to full capacity would result in a value of 200%. +* **Memory Usage %:** Percentage of memory currently in use by the system. +* **Swap Usage %:** Percentage of swap space currently in use by the system. ## Custom Dashboards diff --git a/docs/integrations/system_metrics.md b/docs/integrations/system_metrics.md index 0cb811915..54856ce03 100644 --- a/docs/integrations/system_metrics.md +++ b/docs/integrations/system_metrics.md @@ -1,28 +1,84 @@ -By default, **Logfire** does not collect system metrics. +The [`logfire.instrument_system_metrics()`][logfire.Logfire.instrument_system_metrics] method can be used to collect system metrics with **Logfire**, such as CPU and memory usage. -To enable metrics, you need to install the `logfire[system-metrics]` extra: +## Installation + +Install `logfire` with the `system-metrics` extra: {{ install_logfire(extras=['system-metrics']) }} -### Available Metrics - -Logfire collects the following system metrics: - -* `system.cpu.time`: CPU time spent in different modes. -* `system.cpu.utilization`: CPU utilization in different modes. -* `system.memory.usage`: Memory usage. -* `system.memory.utilization`: Memory utilization in different modes. -* `system.swap.usage`: Swap usage. -* `system.swap.utilization`: Swap utilization -* `system.disk.io`: Disk I/O operations (read/write). -* `system.disk.operations`: Disk operations (read/write). -* `system.disk.time`: Disk time (read/write). -* `system.network.dropped.packets`: Dropped packets (transmit/receive). -* `system.network.packets`: Packets (transmit/receive). -* `system.network.errors`: Network errors (transmit/receive). -* `system.network.io`: Network I/O (transmit/receive). -* `system.network.connections`: Network connections (family/type). -* `system.thread_count`: Thread count. -* `process.runtime.memory`: Process memory usage. -* `process.runtime.cpu.time`: Process CPU time. -* `process.runtime.gc_count`: Process garbage collection count. +## Usage + +```py +import logfire + +logfire.configure() + +logfire.instrument_system_metrics() +``` + +Then in your project, click on 'Dashboards' in the top bar, click 'New Dashboard', and select 'Basic System Metrics (Logfire)' from the dropdown. + +## Configuration + +By default, `instrument_system_metrics` collects only the metrics it needs to display the 'Basic System Metrics (Logfire)' dashboard. You can choose exactly which metrics to collect and how much data to collect about each metric. The default is equivalent to this: + +```py +logfire.instrument_system_metrics({ + 'process.runtime.cpu.utilization': None, # (1)! + 'system.cpu.simple_utilization': None, # (2)! + 'system.memory.utilization': ['available'], # (3)! + 'system.swap.utilization': ['used'], # (4)! +}) +``` + +1. `process.runtime.cpu.utilization` will lead to exporting a metric that is actually named `process.runtime.cpython.cpu.utilization` or a similar name depending on the Python implementation used. The `None` value means that there are no fields to configure for this metric. The value of this metric is `[psutil.Process().cpu_percent()](https://psutil.readthedocs.io/en/latest/#psutil.Process.cpu_percent) / 100`, i.e. the fraction of CPU time used by this process, where 1 means using 100% of a single CPU core. The value can be greater than 1 if the process uses multiple cores. +2. The `None` value means that there are no fields to configure for this metric. The value of this metric is `[psutil.cpu_percent()](https://psutil.readthedocs.io/en/latest/#psutil.cpu_percent) / 100`, i.e. the fraction of CPU time used by the whole system, where 1 means using 100% of all CPU cores. +3. The value here is a list of 'modes' of memory. The full list can be seen in the [`psutil` documentation](https://psutil.readthedocs.io/en/latest/#psutil.virtual_memory). `available` is "the memory that can be given instantly to processes without the system going into swap. This is calculated by summing different memory metrics that vary depending on the platform. It is supposed to be used to monitor actual memory usage in a cross platform fashion." The value of the metric is a number between 0 and 1, and subtracting the value from 1 gives the fraction of memory used. +4. This is the fraction of available swap used. The value is a number between 0 and 1. + +To collect lots of detailed data about all available metrics, use `logfire.instrument_system_metrics(base='full')`. + +!!! warning + The amount of data collected by `base='full'` can be expensive, especially if you have many servers, + and this is easy to forget about. If you enable this, be sure to monitor your usage and costs. + + The most expensive metrics are `system.cpu.utilization/time` which collect data for each core and each mode, + and `system.disk.*` which collect data for each disk device. The exact number depends on the machine hardware, + but this can result in hundreds of data points per minute from each instrumented host. + +`logfire.instrument_system_metrics(base='full')` is equivalent to: + +```py +logfire.instrument_system_metrics({ + 'system.cpu.simple_utilization': None, + 'system.cpu.time': ['idle', 'user', 'system', 'irq', 'softirq', 'nice', 'iowait', 'steal', 'interrupt', 'dpc'], + 'system.cpu.utilization': ['idle', 'user', 'system', 'irq', 'softirq', 'nice', 'iowait', 'steal', 'interrupt', 'dpc'], + 'system.memory.usage': ['available', 'used', 'free', 'active', 'inactive', 'buffers', 'cached', 'shared', 'wired', 'slab', 'total'], + 'system.memory.utilization': ['available', 'used', 'free', 'active', 'inactive', 'buffers', 'cached', 'shared', 'wired', 'slab'], + 'system.swap.usage': ['used', 'free'], + 'system.swap.utilization': ['used'], + 'system.disk.io': ['read', 'write'], + 'system.disk.operations': ['read', 'write'], + 'system.disk.time': ['read', 'write'], + 'system.network.dropped.packets': ['transmit', 'receive'], + 'system.network.packets': ['transmit', 'receive'], + 'system.network.errors': ['transmit', 'receive'], + 'system.network.io': ['transmit', 'receive'], + 'system.thread_count': None, + 'process.runtime.memory': ['rss', 'vms'], + 'process.runtime.cpu.time': ['user', 'system'], + 'process.runtime.gc_count': None, + 'process.runtime.thread_count': None, + 'process.runtime.cpu.utilization': None, + 'process.runtime.context_switches': ['involuntary', 'voluntary'], + 'process.open_file_descriptor.count': None, +}) +``` + +Each key here is a metric name. The values have different meanings for different metrics. For example, for `system.cpu.utilization`, the value is a list of CPU modes. So there will be a separate row for each CPU core saying what percentage of time it spent idle, another row for the time spent waiting for IO, etc. There are no fields to configure for `system.thread_count`, so the value is `None`. + +For convenient customizability, the first dict argument is merged with the base. For example, if you want to collect disk read operations (but not writes) you can write: + +- `logfire.instrument_system_metrics({'system.disk.operations': ['read']})` to collect that data in addition to the basic defaults. +- `logfire.instrument_system_metrics({'system.disk.operations': ['read']}, base='full')` to collect detailed data about all metrics, excluding disk write operations. +- `logfire.instrument_system_metrics({'system.disk.operations': ['read']}, base=None)` to collect only disk read operations and nothing else. diff --git a/logfire-api/logfire_api/__init__.py b/logfire-api/logfire_api/__init__.py index 396c1ea8f..02a4c60d4 100644 --- a/logfire-api/logfire_api/__init__.py +++ b/logfire-api/logfire_api/__init__.py @@ -123,6 +123,8 @@ def instrument_openai(self, *args, **kwargs) -> ContextManager[None]: def instrument_aiohttp_client(self, *args, **kwargs) -> None: ... + def instrument_system_metrics(self, *args, **kwargs) -> None: ... + def shutdown(self, *args, **kwargs) -> None: ... DEFAULT_LOGFIRE_INSTANCE = Logfire() @@ -158,6 +160,7 @@ def shutdown(self, *args, **kwargs) -> None: ... instrument_redis = DEFAULT_LOGFIRE_INSTANCE.instrument_redis instrument_pymongo = DEFAULT_LOGFIRE_INSTANCE.instrument_pymongo instrument_mysql = DEFAULT_LOGFIRE_INSTANCE.instrument_mysql + instrument_system_metrics = DEFAULT_LOGFIRE_INSTANCE.instrument_system_metrics shutdown = DEFAULT_LOGFIRE_INSTANCE.shutdown def no_auto_trace(x): diff --git a/logfire-api/logfire_api/__init__.pyi b/logfire-api/logfire_api/__init__.pyi index fd1c038c6..c7de505b8 100644 --- a/logfire-api/logfire_api/__init__.pyi +++ b/logfire-api/logfire_api/__init__.pyi @@ -11,7 +11,7 @@ from .integrations.logging import LogfireLoggingHandler as LogfireLoggingHandler from .integrations.structlog import LogfireProcessor as StructlogProcessor from .version import VERSION as VERSION -__all__ = ['Logfire', 'LogfireSpan', 'LevelName', 'ConsoleOptions', 'PydanticPlugin', 'configure', 'span', 'instrument', 'log', 'trace', 'debug', 'notice', 'info', 'warn', 'error', 'exception', 'fatal', 'force_flush', 'log_slow_async_callbacks', 'install_auto_tracing', 'instrument_fastapi', 'instrument_openai', 'instrument_anthropic', 'instrument_asyncpg', 'instrument_httpx', 'instrument_celery', 'instrument_requests', 'instrument_psycopg', 'instrument_django', 'instrument_flask', 'instrument_starlette', 'instrument_aiohttp_client', 'instrument_sqlalchemy', 'instrument_redis', 'instrument_pymongo', 'instrument_mysql', 'AutoTraceModule', 'with_tags', 'with_settings', 'shutdown', 'load_spans_from_file', 'no_auto_trace', 'METRICS_PREFERRED_TEMPORALITY', 'ScrubMatch', 'ScrubbingOptions', 'VERSION', 'suppress_instrumentation', 'StructlogProcessor', 'LogfireLoggingHandler', 'TailSamplingOptions'] +__all__ = ['Logfire', 'LogfireSpan', 'LevelName', 'ConsoleOptions', 'PydanticPlugin', 'configure', 'span', 'instrument', 'log', 'trace', 'debug', 'notice', 'info', 'warn', 'error', 'exception', 'fatal', 'force_flush', 'log_slow_async_callbacks', 'install_auto_tracing', 'instrument_fastapi', 'instrument_openai', 'instrument_anthropic', 'instrument_asyncpg', 'instrument_httpx', 'instrument_celery', 'instrument_requests', 'instrument_psycopg', 'instrument_django', 'instrument_flask', 'instrument_starlette', 'instrument_aiohttp_client', 'instrument_sqlalchemy', 'instrument_redis', 'instrument_pymongo', 'instrument_mysql', 'instrument_system_metrics', 'AutoTraceModule', 'with_tags', 'with_settings', 'shutdown', 'load_spans_from_file', 'no_auto_trace', 'METRICS_PREFERRED_TEMPORALITY', 'ScrubMatch', 'ScrubbingOptions', 'VERSION', 'suppress_instrumentation', 'StructlogProcessor', 'LogfireLoggingHandler', 'TailSamplingOptions'] DEFAULT_LOGFIRE_INSTANCE = Logfire() span = DEFAULT_LOGFIRE_INSTANCE.span @@ -35,6 +35,7 @@ instrument_sqlalchemy = DEFAULT_LOGFIRE_INSTANCE.instrument_sqlalchemy instrument_redis = DEFAULT_LOGFIRE_INSTANCE.instrument_redis instrument_pymongo = DEFAULT_LOGFIRE_INSTANCE.instrument_pymongo instrument_mysql = DEFAULT_LOGFIRE_INSTANCE.instrument_mysql +instrument_system_metrics = DEFAULT_LOGFIRE_INSTANCE.instrument_system_metrics shutdown = DEFAULT_LOGFIRE_INSTANCE.shutdown with_tags = DEFAULT_LOGFIRE_INSTANCE.with_tags with_settings = DEFAULT_LOGFIRE_INSTANCE.with_settings diff --git a/logfire-api/logfire_api/_internal/config.pyi b/logfire-api/logfire_api/_internal/config.pyi index cd63c8979..06ef35a8f 100644 --- a/logfire-api/logfire_api/_internal/config.pyi +++ b/logfire-api/logfire_api/_internal/config.pyi @@ -13,7 +13,7 @@ from .exporters.remove_pending import RemovePendingSpansExporter as RemovePendin from .exporters.tail_sampling import TailSamplingOptions as TailSamplingOptions, TailSamplingProcessor as TailSamplingProcessor from .exporters.test import TestExporter as TestExporter from .integrations.executors import instrument_executors as instrument_executors -from .metrics import ProxyMeterProvider as ProxyMeterProvider, configure_metrics as configure_metrics +from .metrics import ProxyMeterProvider as ProxyMeterProvider from .scrubbing import BaseScrubber as BaseScrubber, NOOP_SCRUBBER as NOOP_SCRUBBER, ScrubCallback as ScrubCallback, Scrubber as Scrubber, ScrubbingOptions as ScrubbingOptions from .stack_info import warn_at_user_stacklevel as warn_at_user_stacklevel from .tracer import PendingSpanProcessor as PendingSpanProcessor, ProxyTracerProvider as ProxyTracerProvider @@ -53,7 +53,7 @@ class PydanticPlugin: include: set[str] = ... exclude: set[str] = ... -def configure(*, send_to_logfire: bool | Literal['if-token-present'] | None = None, token: str | None = None, project_name: str | None = None, service_name: str | None = None, service_version: str | None = None, trace_sample_rate: float | None = None, console: ConsoleOptions | Literal[False] | None = None, show_summary: bool | None = None, config_dir: Path | str | None = None, data_dir: Path | str | None = None, base_url: str | None = None, collect_system_metrics: bool | None = None, id_generator: IdGenerator | None = None, ns_timestamp_generator: Callable[[], int] | None = None, processors: None = None, additional_span_processors: Sequence[SpanProcessor] | None = None, default_span_processor: Callable[[SpanExporter], SpanProcessor] | None = None, metric_readers: None = None, additional_metric_readers: Sequence[MetricReader] | None = None, pydantic_plugin: PydanticPlugin | None = None, fast_shutdown: bool = False, scrubbing_patterns: Sequence[str] | None = None, scrubbing_callback: ScrubCallback | None = None, scrubbing: ScrubbingOptions | Literal[False] | None = None, inspect_arguments: bool | None = None, tail_sampling: TailSamplingOptions | None = None) -> None: +def configure(*, send_to_logfire: bool | Literal['if-token-present'] | None = None, token: str | None = None, project_name: str | None = None, service_name: str | None = None, service_version: str | None = None, trace_sample_rate: float | None = None, console: ConsoleOptions | Literal[False] | None = None, show_summary: bool | None = None, config_dir: Path | str | None = None, data_dir: Path | str | None = None, base_url: str | None = None, collect_system_metrics: None = None, id_generator: IdGenerator | None = None, ns_timestamp_generator: Callable[[], int] | None = None, processors: None = None, additional_span_processors: Sequence[SpanProcessor] | None = None, default_span_processor: Callable[[SpanExporter], SpanProcessor] | None = None, metric_readers: None = None, additional_metric_readers: Sequence[MetricReader] | None = None, pydantic_plugin: PydanticPlugin | None = None, fast_shutdown: bool = False, scrubbing_patterns: Sequence[str] | None = None, scrubbing_callback: ScrubCallback | None = None, scrubbing: ScrubbingOptions | Literal[False] | None = None, inspect_arguments: bool | None = None, tail_sampling: TailSamplingOptions | None = None) -> None: """Configure the logfire SDK. Args: @@ -80,8 +80,7 @@ def configure(*, send_to_logfire: bool | Literal['if-token-present'] | None = No `LOGFIRE_CONFIG_DIR` environment variable, otherwise defaults to the current working directory. data_dir: Directory to store credentials, and logs. If `None` uses the `LOGFIRE_CREDENTIALS_DIR` environment variable, otherwise defaults to `'.logfire'`. base_url: Root URL for the Logfire API. If `None` uses the `LOGFIRE_BASE_URL` environment variable, otherwise defaults to https://logfire-api.pydantic.dev. - collect_system_metrics: Whether to collect system metrics like CPU and memory usage. If `None` uses the `LOGFIRE_COLLECT_SYSTEM_METRICS` environment variable, - otherwise defaults to `True`. + collect_system_metrics: Legacy argument, use `logfire.instrument_system_metrics()` instead. id_generator: Generator for span IDs. Defaults to `RandomIdGenerator()` from the OpenTelemetry SDK. ns_timestamp_generator: Generator for nanosecond timestamps. Defaults to [`time.time_ns`][time.time_ns] from the Python standard library. @@ -126,7 +125,6 @@ class _LogfireConfigData: console: ConsoleOptions | Literal[False] | None show_summary: bool data_dir: Path - collect_system_metrics: bool id_generator: IdGenerator ns_timestamp_generator: Callable[[], int] additional_span_processors: Sequence[SpanProcessor] | None @@ -138,14 +136,14 @@ class _LogfireConfigData: tail_sampling: TailSamplingOptions | None class LogfireConfig(_LogfireConfigData): - def __init__(self, base_url: str | None = None, send_to_logfire: bool | None = None, token: str | None = None, project_name: str | None = None, service_name: str | None = None, service_version: str | None = None, trace_sample_rate: float | None = None, console: ConsoleOptions | Literal[False] | None = None, show_summary: bool | None = None, config_dir: Path | None = None, data_dir: Path | None = None, collect_system_metrics: bool | None = None, id_generator: IdGenerator | None = None, ns_timestamp_generator: Callable[[], int] | None = None, additional_span_processors: Sequence[SpanProcessor] | None = None, default_span_processor: Callable[[SpanExporter], SpanProcessor] | None = None, additional_metric_readers: Sequence[MetricReader] | None = None, pydantic_plugin: PydanticPlugin | None = None, fast_shutdown: bool = False, scrubbing: ScrubbingOptions | Literal[False] | None = None, inspect_arguments: bool | None = None, tail_sampling: TailSamplingOptions | None = None) -> None: + def __init__(self, base_url: str | None = None, send_to_logfire: bool | None = None, token: str | None = None, project_name: str | None = None, service_name: str | None = None, service_version: str | None = None, trace_sample_rate: float | None = None, console: ConsoleOptions | Literal[False] | None = None, show_summary: bool | None = None, config_dir: Path | None = None, data_dir: Path | None = None, id_generator: IdGenerator | None = None, ns_timestamp_generator: Callable[[], int] | None = None, additional_span_processors: Sequence[SpanProcessor] | None = None, default_span_processor: Callable[[SpanExporter], SpanProcessor] | None = None, additional_metric_readers: Sequence[MetricReader] | None = None, pydantic_plugin: PydanticPlugin | None = None, fast_shutdown: bool = False, scrubbing: ScrubbingOptions | Literal[False] | None = None, inspect_arguments: bool | None = None, tail_sampling: TailSamplingOptions | None = None) -> None: """Create a new LogfireConfig. Users should never need to call this directly, instead use `logfire.configure`. See `_LogfireConfigData` for parameter documentation. """ - def configure(self, base_url: str | None, send_to_logfire: bool | Literal['if-token-present'] | None, token: str | None, project_name: str | None, service_name: str | None, service_version: str | None, trace_sample_rate: float | None, console: ConsoleOptions | Literal[False] | None, show_summary: bool | None, config_dir: Path | None, data_dir: Path | None, collect_system_metrics: bool | None, id_generator: IdGenerator | None, ns_timestamp_generator: Callable[[], int] | None, additional_span_processors: Sequence[SpanProcessor] | None, default_span_processor: Callable[[SpanExporter], SpanProcessor] | None, additional_metric_readers: Sequence[MetricReader] | None, pydantic_plugin: PydanticPlugin | None, fast_shutdown: bool, scrubbing: ScrubbingOptions | Literal[False] | None, inspect_arguments: bool | None, tail_sampling: TailSamplingOptions | None) -> None: ... + def configure(self, base_url: str | None, send_to_logfire: bool | Literal['if-token-present'] | None, token: str | None, project_name: str | None, service_name: str | None, service_version: str | None, trace_sample_rate: float | None, console: ConsoleOptions | Literal[False] | None, show_summary: bool | None, config_dir: Path | None, data_dir: Path | None, id_generator: IdGenerator | None, ns_timestamp_generator: Callable[[], int] | None, additional_span_processors: Sequence[SpanProcessor] | None, default_span_processor: Callable[[SpanExporter], SpanProcessor] | None, additional_metric_readers: Sequence[MetricReader] | None, pydantic_plugin: PydanticPlugin | None, fast_shutdown: bool, scrubbing: ScrubbingOptions | Literal[False] | None, inspect_arguments: bool | None, tail_sampling: TailSamplingOptions | None) -> None: ... def initialize(self) -> ProxyTracerProvider: """Configure internals to start exporting traces and metrics.""" def force_flush(self, timeout_millis: int = 30000) -> bool: diff --git a/logfire-api/logfire_api/_internal/config_params.pyi b/logfire-api/logfire_api/_internal/config_params.pyi index adb32310a..8780ce420 100644 --- a/logfire-api/logfire_api/_internal/config_params.pyi +++ b/logfire-api/logfire_api/_internal/config_params.pyi @@ -9,7 +9,6 @@ from logfire.exceptions import LogfireConfigError as LogfireConfigError from pathlib import Path from typing import Any, Callable, TypeVar -COLLECT_SYSTEM_METRICS_DEFAULT: bool T = TypeVar('T') slots_true: Incomplete PydanticPluginRecordValues: Incomplete @@ -38,7 +37,6 @@ SERVICE_NAME: Incomplete SERVICE_VERSION: Incomplete SHOW_SUMMARY: Incomplete CREDENTIALS_DIR: Incomplete -COLLECT_SYSTEM_METRICS: Incomplete CONSOLE: Incomplete CONSOLE_COLORS: Incomplete CONSOLE_SPAN_STYLE: Incomplete diff --git a/logfire-api/logfire_api/_internal/integrations/system_metrics.pyi b/logfire-api/logfire_api/_internal/integrations/system_metrics.pyi new file mode 100644 index 000000000..36c651114 --- /dev/null +++ b/logfire-api/logfire_api/_internal/integrations/system_metrics.pyi @@ -0,0 +1,16 @@ +from _typeshed import Incomplete +from logfire import Logfire as Logfire +from typing import Iterable, Literal +from typing_extensions import LiteralString + +MetricName: type[Literal['system.cpu.simple_utilization', 'system.cpu.time', 'system.cpu.utilization', 'system.memory.usage', 'system.memory.utilization', 'system.swap.usage', 'system.swap.utilization', 'system.disk.io', 'system.disk.operations', 'system.disk.time', 'system.network.dropped.packets', 'system.network.packets', 'system.network.errors', 'system.network.io', 'system.network.connections', 'system.thread_count', 'process.runtime.memory', 'process.runtime.cpu.time', 'process.runtime.gc_count', 'process.runtime.thread_count', 'process.runtime.cpu.utilization', 'process.runtime.context_switches', 'process.open_file_descriptor.count']] +Config = dict[MetricName, Iterable[str] | None] +CPU_FIELDS: list[LiteralString] +MEMORY_FIELDS: list[LiteralString] +FULL_CONFIG: Config +BASIC_CONFIG: Config +Base: Incomplete + +def get_base_config(base: Base) -> Config: ... +def instrument_system_metrics(logfire_instance: Logfire, config: Config | None = None, base: Base = 'basic'): ... +def measure_simple_cpu_utilization(logfire_instance: Logfire): ... diff --git a/logfire-api/logfire_api/_internal/main.pyi b/logfire-api/logfire_api/_internal/main.pyi index a902a257b..c9b6efd81 100644 --- a/logfire-api/logfire_api/_internal/main.pyi +++ b/logfire-api/logfire_api/_internal/main.pyi @@ -18,6 +18,7 @@ from .integrations.pymongo import PymongoInstrumentKwargs as PymongoInstrumentKw from .integrations.redis import RedisInstrumentKwargs as RedisInstrumentKwargs from .integrations.sqlalchemy import SQLAlchemyInstrumentKwargs as SQLAlchemyInstrumentKwargs from .integrations.starlette import StarletteInstrumentKwargs as StarletteInstrumentKwargs +from .integrations.system_metrics import Base as SystemMetricsBase, Config as SystemMetricsConfig from .json_encoder import logfire_json_dumps as logfire_json_dumps from .json_schema import JsonSchemaProperties as JsonSchemaProperties, attributes_json_schema as attributes_json_schema, attributes_json_schema_properties as attributes_json_schema_properties, create_json_schema as create_json_schema from .metrics import ProxyMeterProvider as ProxyMeterProvider @@ -643,6 +644,17 @@ class Logfire: If a connection is provided, returns the instrumented connection. If no connection is provided, returns None. """ + def instrument_system_metrics(self, config: SystemMetricsConfig | None = None, base: SystemMetricsBase = 'basic') -> None: + """Collect system metrics. + + See [the guide](https://docs.pydantic.dev/logfire/integrations/system_metrics/) for more information. + + Args: + config: A dictionary where the keys are metric names + and the values are optional further configuration for that metric. + base: A string indicating the base config dictionary which `config` will be merged with, + or `None` for an empty base config. + """ def metric_counter(self, name: str, *, unit: str = '', description: str = '') -> Counter: """Create a counter metric. diff --git a/logfire-api/logfire_api/_internal/metrics.pyi b/logfire-api/logfire_api/_internal/metrics.pyi index eb09308f2..a5f56d253 100644 --- a/logfire-api/logfire_api/_internal/metrics.pyi +++ b/logfire-api/logfire_api/_internal/metrics.pyi @@ -8,12 +8,6 @@ from typing import Any, Generic, Sequence, TypeVar from weakref import WeakSet Gauge: Incomplete -CPU_FIELDS: Incomplete -MEMORY_FIELDS: Incomplete -DEFAULT_CONFIG: Incomplete -INSTRUMENTOR: Incomplete - -def configure_metrics(meter_provider: MeterProvider) -> None: ... @dataclasses.dataclass class ProxyMeterProvider(MeterProvider): diff --git a/logfire/__init__.py b/logfire/__init__.py index 47c18b0f3..7e469b417 100644 --- a/logfire/__init__.py +++ b/logfire/__init__.py @@ -39,6 +39,7 @@ instrument_redis = DEFAULT_LOGFIRE_INSTANCE.instrument_redis instrument_pymongo = DEFAULT_LOGFIRE_INSTANCE.instrument_pymongo instrument_mysql = DEFAULT_LOGFIRE_INSTANCE.instrument_mysql +instrument_system_metrics = DEFAULT_LOGFIRE_INSTANCE.instrument_system_metrics shutdown = DEFAULT_LOGFIRE_INSTANCE.shutdown with_tags = DEFAULT_LOGFIRE_INSTANCE.with_tags # with_trace_sample_rate = DEFAULT_LOGFIRE_INSTANCE.with_trace_sample_rate @@ -115,6 +116,7 @@ def loguru_handler() -> dict[str, Any]: 'instrument_redis', 'instrument_pymongo', 'instrument_mysql', + 'instrument_system_metrics', 'AutoTraceModule', 'with_tags', 'with_settings', diff --git a/logfire/_internal/cli.py b/logfire/_internal/cli.py index 88064d372..dbff87715 100644 --- a/logfire/_internal/cli.py +++ b/logfire/_internal/cli.py @@ -124,7 +124,7 @@ def parse_backfill(args: argparse.Namespace) -> None: # pragma: no cover sys.exit(1) logfire_url = cast(str, args.logfire_url) - logfire.configure(data_dir=data_dir, base_url=logfire_url, collect_system_metrics=False) + logfire.configure(data_dir=data_dir, base_url=logfire_url) config = logfire_config.GLOBAL_CONFIG config.initialize() token = config.token diff --git a/logfire/_internal/config.py b/logfire/_internal/config.py index e85e407da..8fe2db310 100644 --- a/logfire/_internal/config.py +++ b/logfire/_internal/config.py @@ -74,7 +74,7 @@ from .exporters.tail_sampling import TailSamplingOptions, TailSamplingProcessor from .exporters.test import TestExporter from .integrations.executors import instrument_executors -from .metrics import ProxyMeterProvider, configure_metrics +from .metrics import ProxyMeterProvider from .scrubbing import NOOP_SCRUBBER, BaseScrubber, Scrubber, ScrubbingOptions, ScrubCallback from .stack_info import warn_at_user_stacklevel from .tracer import PendingSpanProcessor, ProxyTracerProvider @@ -148,7 +148,7 @@ def configure( config_dir: Path | str | None = None, data_dir: Path | str | None = None, base_url: str | None = None, - collect_system_metrics: bool | None = None, + collect_system_metrics: None = None, id_generator: IdGenerator | None = None, ns_timestamp_generator: Callable[[], int] | None = None, processors: None = None, @@ -190,8 +190,7 @@ def configure( `LOGFIRE_CONFIG_DIR` environment variable, otherwise defaults to the current working directory. data_dir: Directory to store credentials, and logs. If `None` uses the `LOGFIRE_CREDENTIALS_DIR` environment variable, otherwise defaults to `'.logfire'`. base_url: Root URL for the Logfire API. If `None` uses the `LOGFIRE_BASE_URL` environment variable, otherwise defaults to https://logfire-api.pydantic.dev. - collect_system_metrics: Whether to collect system metrics like CPU and memory usage. If `None` uses the `LOGFIRE_COLLECT_SYSTEM_METRICS` environment variable, - otherwise defaults to `True`. + collect_system_metrics: Legacy argument, use [`logfire.instrument_system_metrics()`](https://docs.pydantic.dev/logfire/integrations/system_metrics/) instead. id_generator: Generator for span IDs. Defaults to `RandomIdGenerator()` from the OpenTelemetry SDK. ns_timestamp_generator: Generator for nanosecond timestamps. Defaults to [`time.time_ns`][time.time_ns] from the Python standard library. @@ -226,6 +225,19 @@ def configure( 'The `metric_readers` argument has been replaced by `additional_metric_readers`. ' 'Set `send_to_logfire=False` to disable the default metric reader.' ) + + if collect_system_metrics is False: + raise ValueError( + 'The `collect_system_metrics` argument has been removed. ' + 'System metrics are no longer collected by default.' + ) + + if collect_system_metrics is not None: + raise ValueError( + 'The `collect_system_metrics` argument has been removed. ' + 'Use `logfire.instrument_system_metrics()` instead.' + ) + if scrubbing_callback or scrubbing_patterns: if scrubbing is not None: raise ValueError( @@ -251,7 +263,6 @@ def configure( show_summary=show_summary, config_dir=Path(config_dir) if config_dir else None, data_dir=Path(data_dir) if data_dir else None, - collect_system_metrics=collect_system_metrics, id_generator=id_generator, ns_timestamp_generator=ns_timestamp_generator, additional_span_processors=additional_span_processors, @@ -311,9 +322,6 @@ class _LogfireConfigData: data_dir: Path """The directory to store Logfire config in""" - collect_system_metrics: bool - """Whether to collect system metrics like CPU and memory usage""" - id_generator: IdGenerator """The ID generator to use""" @@ -357,7 +365,6 @@ def _load_configuration( show_summary: bool | None, config_dir: Path | None, data_dir: Path | None, - collect_system_metrics: bool | None, id_generator: IdGenerator | None, ns_timestamp_generator: Callable[[], int] | None, additional_span_processors: Sequence[SpanProcessor] | None, @@ -381,7 +388,6 @@ def _load_configuration( self.trace_sample_rate = param_manager.load_param('trace_sample_rate', trace_sample_rate) self.show_summary = param_manager.load_param('show_summary', show_summary) self.data_dir = param_manager.load_param('data_dir', data_dir) - self.collect_system_metrics = param_manager.load_param('collect_system_metrics', collect_system_metrics) self.inspect_arguments = param_manager.load_param('inspect_arguments', inspect_arguments) self.ignore_no_config = param_manager.load_param('ignore_no_config') if self.inspect_arguments and sys.version_info[:2] <= (3, 8): @@ -461,7 +467,6 @@ def __init__( show_summary: bool | None = None, config_dir: Path | None = None, data_dir: Path | None = None, - collect_system_metrics: bool | None = None, id_generator: IdGenerator | None = None, ns_timestamp_generator: Callable[[], int] | None = None, additional_span_processors: Sequence[SpanProcessor] | None = None, @@ -493,7 +498,6 @@ def __init__( show_summary=show_summary, config_dir=config_dir, data_dir=data_dir, - collect_system_metrics=collect_system_metrics, id_generator=id_generator, ns_timestamp_generator=ns_timestamp_generator, additional_span_processors=additional_span_processors, @@ -529,7 +533,6 @@ def configure( show_summary: bool | None, config_dir: Path | None, data_dir: Path | None, - collect_system_metrics: bool | None, id_generator: IdGenerator | None, ns_timestamp_generator: Callable[[], int] | None, additional_span_processors: Sequence[SpanProcessor] | None, @@ -555,7 +558,6 @@ def configure( show_summary, config_dir, data_dir, - collect_system_metrics, id_generator, ns_timestamp_generator, additional_span_processors, @@ -751,8 +753,6 @@ def check_token(): ) ], ) - if self.collect_system_metrics: - configure_metrics(meter_provider) # we need to shut down any existing providers to avoid leaking resources (like threads) # but if this takes longer than 100ms you should call `logfire.shutdown` before reconfiguring diff --git a/logfire/_internal/config_params.py b/logfire/_internal/config_params.py index 15daa7f95..3b0ad1436 100644 --- a/logfire/_internal/config_params.py +++ b/logfire/_internal/config_params.py @@ -17,14 +17,6 @@ from .exporters.console import ConsoleColorsValues from .utils import read_toml_file -try: - import opentelemetry.instrumentation.system_metrics # noqa: F401 # type: ignore - - COLLECT_SYSTEM_METRICS_DEFAULT = True -except ImportError: # pragma: no cover - COLLECT_SYSTEM_METRICS_DEFAULT = False # type: ignore - - T = TypeVar('T') slots_true = {'slots': True} if sys.version_info >= (3, 10) else {} @@ -77,8 +69,6 @@ class _DefaultCallback: """Whether to show the summary when a new project is created.""" CREDENTIALS_DIR = ConfigParam(env_vars=['LOGFIRE_CREDENTIALS_DIR'], allow_file_config=True, default='.logfire', tp=Path) """The directory where to store the configuration file.""" -COLLECT_SYSTEM_METRICS = ConfigParam(env_vars=['LOGFIRE_COLLECT_SYSTEM_METRICS'], allow_file_config=True, default=COLLECT_SYSTEM_METRICS_DEFAULT, tp=bool) -"""Whether to collect system metrics.""" CONSOLE = ConfigParam(env_vars=['LOGFIRE_CONSOLE'], allow_file_config=True, default=True, tp=bool) """Whether to enable/disable the console exporter.""" CONSOLE_COLORS = ConfigParam(env_vars=['LOGFIRE_CONSOLE_COLORS'], allow_file_config=True, default='auto', tp=ConsoleColorsValues) @@ -120,7 +110,6 @@ class _DefaultCallback: 'trace_sample_rate': TRACE_SAMPLE_RATE, 'show_summary': SHOW_SUMMARY, 'data_dir': CREDENTIALS_DIR, - 'collect_system_metrics': COLLECT_SYSTEM_METRICS, 'console': CONSOLE, 'console_colors': CONSOLE_COLORS, 'console_span_style': CONSOLE_SPAN_STYLE, diff --git a/logfire/_internal/integrations/system_metrics.py b/logfire/_internal/integrations/system_metrics.py new file mode 100644 index 000000000..b2ad889e0 --- /dev/null +++ b/logfire/_internal/integrations/system_metrics.py @@ -0,0 +1,179 @@ +from __future__ import annotations + +import sys +from platform import python_implementation +from typing import TYPE_CHECKING, Dict, Iterable, Literal, Optional, cast + +from opentelemetry.metrics import CallbackOptions, Observation + +if TYPE_CHECKING: + from typing_extensions import LiteralString + + from logfire import Logfire + +try: + import psutil + from opentelemetry.instrumentation.system_metrics import ( + _DEFAULT_CONFIG, # type: ignore + SystemMetricsInstrumentor, + ) +except ModuleNotFoundError as e: # pragma: no cover + raise RuntimeError( + '`logfire.instrument_system_metrics()` requires the `opentelemetry-instrumentation-system-metrics` package.\n' + 'You can install this with:\n' + " pip install 'logfire[system-metrics]'" + ) from e + +# stubgen seems to need this redundant type declaration. +MetricName: type[ + Literal[ + 'system.cpu.simple_utilization', + 'system.cpu.time', + 'system.cpu.utilization', + 'system.memory.usage', + 'system.memory.utilization', + 'system.swap.usage', + 'system.swap.utilization', + 'system.disk.io', + 'system.disk.operations', + 'system.disk.time', + 'system.network.dropped.packets', + 'system.network.packets', + 'system.network.errors', + 'system.network.io', + 'system.network.connections', + 'system.thread_count', + 'process.runtime.memory', + 'process.runtime.cpu.time', + 'process.runtime.gc_count', + 'process.runtime.thread_count', + 'process.runtime.cpu.utilization', + 'process.runtime.context_switches', + 'process.open_file_descriptor.count', + ] +] = Literal[ # type: ignore # but pyright doesn't like it + 'system.cpu.simple_utilization', + 'system.cpu.time', + 'system.cpu.utilization', + 'system.memory.usage', + 'system.memory.utilization', + 'system.swap.usage', + 'system.swap.utilization', + 'system.disk.io', + 'system.disk.operations', + 'system.disk.time', + 'system.network.dropped.packets', + 'system.network.packets', + 'system.network.errors', + 'system.network.io', + 'system.network.connections', + 'system.thread_count', + 'process.runtime.memory', + 'process.runtime.cpu.time', + 'process.runtime.gc_count', + 'process.runtime.thread_count', + 'process.runtime.cpu.utilization', + 'process.runtime.context_switches', + 'process.open_file_descriptor.count', +] + +Config = Dict[MetricName, Optional[Iterable[str]]] + +# All the cpu_times fields provided by psutil (used by system_metrics) across all platforms, +# except for 'guest' and 'guest_nice' which are included in 'user' and 'nice' in Linux (see psutil._cpu_tot_time). +# Docs: https://psutil.readthedocs.io/en/latest/#psutil.cpu_times +CPU_FIELDS: list[LiteralString] = 'idle user system irq softirq nice iowait steal interrupt dpc'.split() + +# All the virtual_memory fields provided by psutil across all platforms, +# except for 'percent' which can be calculated as `(total - available) / total * 100`. +# Docs: https://psutil.readthedocs.io/en/latest/#psutil.virtual_memory +MEMORY_FIELDS: list[LiteralString] = 'available used free active inactive buffers cached shared wired slab'.split() + +FULL_CONFIG: Config = { + **cast(Config, _DEFAULT_CONFIG), + 'system.cpu.simple_utilization': None, + 'system.cpu.time': CPU_FIELDS, + 'system.cpu.utilization': CPU_FIELDS, + # For usage, knowing the total amount of bytes available might be handy. + 'system.memory.usage': MEMORY_FIELDS + ['total'], + # For utilization, the total is always just 1 (100%), so it's not included. + 'system.memory.utilization': MEMORY_FIELDS, + # The 'free' utilization is not included because it's just 1 - 'used'. + 'system.swap.utilization': ['used'], +} + +if sys.platform == 'darwin': # pragma: no cover + # see https://github.com/giampaolo/psutil/issues/1219 + # upstream pr: https://github.com/open-telemetry/opentelemetry-python-contrib/pull/2008 + FULL_CONFIG.pop('system.network.connections', None) + +BASIC_CONFIG: Config = { + 'process.runtime.cpu.utilization': None, + 'system.cpu.simple_utilization': None, + # The actually used memory ratio can be calculated as `1 - available`. + 'system.memory.utilization': ['available'], + 'system.swap.utilization': ['used'], +} + +Base = Literal['basic', 'full', None] + + +def get_base_config(base: Base) -> Config: + if base == 'basic': + return BASIC_CONFIG + elif base == 'full': + return FULL_CONFIG + elif base is None: + return {} + else: + raise ValueError(f'Invalid base: {base}') + + +def instrument_system_metrics(logfire_instance: Logfire, config: Config | None = None, base: Base = 'basic'): + config = {**get_base_config(base), **(config or {})} + + if 'system.cpu.simple_utilization' in config: + measure_simple_cpu_utilization(logfire_instance) + + if 'process.runtime.cpu.utilization' in config: + # Override OTEL here, see comment in measure_process_runtime_cpu_utilization..callback. + measure_process_runtime_cpu_utilization(logfire_instance) + del config['process.runtime.cpu.utilization'] + + instrumentor = SystemMetricsInstrumentor(config=config) # type: ignore + instrumentor.instrument() # type: ignore + + +def measure_simple_cpu_utilization(logfire_instance: Logfire): + def callback(_options: CallbackOptions) -> Iterable[Observation]: + # psutil returns a value from 0-100, OTEL values here are generally 0-1, so we divide by 100. + yield Observation(psutil.cpu_percent() / 100) + + logfire_instance.metric_gauge_callback( + 'system.cpu.simple_utilization', + [callback], + description='Average CPU usage across all cores, as a fraction between 0 and 1.', + unit='1', + ) + + +def measure_process_runtime_cpu_utilization(logfire_instance: Logfire): + process = psutil.Process() + # This first call always returns 0, do it here so that the first real measurement from an exporter + # will return a nonzero value. + process.cpu_percent() + + def callback(_options: CallbackOptions) -> Iterable[Observation]: + # psutil returns a value from 0-100, OTEL values here are generally 0-1, so we divide by 100. + # OTEL got this wrong: https://github.com/open-telemetry/opentelemetry-python-contrib/issues/2810 + # A fix has been merged there, but we need to know in the dashboard how to interpret the values. + # So the dashboard will assume a 0-100 range if the scope is 'opentelemetry.instrumentation.system_metrics', + # and a 0-1 range otherwise. In particular the scope will be 'logfire' if it comes from here. + yield Observation(process.cpu_percent() / 100) + + logfire_instance.metric_gauge_callback( + f'process.runtime.{python_implementation().lower()}.cpu.utilization', + [callback], + description='Runtime CPU utilization', + unit='1', + ) diff --git a/logfire/_internal/main.py b/logfire/_internal/main.py index 9aa5615de..830a6d170 100644 --- a/logfire/_internal/main.py +++ b/logfire/_internal/main.py @@ -74,6 +74,7 @@ from .integrations.redis import RedisInstrumentKwargs from .integrations.sqlalchemy import SQLAlchemyInstrumentKwargs from .integrations.starlette import StarletteInstrumentKwargs + from .integrations.system_metrics import Base as SystemMetricsBase, Config as SystemMetricsConfig from .utils import SysExcInfo # This is the type of the exc_info/_exc_info parameter of the log methods. @@ -1254,6 +1255,24 @@ def instrument_mysql( self._warn_if_not_initialized_for_instrumentation() return instrument_mysql(conn, **kwargs) + def instrument_system_metrics( + self, config: SystemMetricsConfig | None = None, base: SystemMetricsBase = 'basic' + ) -> None: + """Collect system metrics. + + See [the guide](https://docs.pydantic.dev/logfire/integrations/system_metrics/) for more information. + + Args: + config: A dictionary where the keys are metric names + and the values are optional further configuration for that metric. + base: A string indicating the base config dictionary which `config` will be merged with, + or `None` for an empty base config. + """ + from .integrations.system_metrics import instrument_system_metrics + + self._warn_if_not_initialized_for_instrumentation() + return instrument_system_metrics(self, config, base) + def metric_counter(self, name: str, *, unit: str = '', description: str = '') -> Counter: """Create a counter metric. diff --git a/logfire/_internal/metrics.py b/logfire/_internal/metrics.py index 09ff6b23d..27a873cb7 100644 --- a/logfire/_internal/metrics.py +++ b/logfire/_internal/metrics.py @@ -1,7 +1,6 @@ from __future__ import annotations import dataclasses -import sys from abc import ABC, abstractmethod from threading import Lock from typing import Any, Generic, Sequence, TypeVar @@ -30,62 +29,6 @@ except ImportError: # pragma: no cover Gauge = None -# All the cpu_times fields provided by psutil (used by system_metrics) across all platforms, -# except for 'guest' and 'guest_nice' which are included in 'user' and 'nice' in Linux (see psutil._cpu_tot_time). -# Docs: https://psutil.readthedocs.io/en/latest/#psutil.cpu_times -CPU_FIELDS = 'idle user system irq softirq nice iowait steal interrupt dpc'.split() - -# All the virtual_memory fields provided by psutil across all platforms, -# except for 'percent' which can be calculated as `(total - available) / total * 100`. -# Docs: https://psutil.readthedocs.io/en/latest/#psutil.virtual_memory -MEMORY_FIELDS = 'total available used free active inactive buffers cached shared wired slab'.split() - -# Based on opentelemetry/instrumentation/system_metrics/__init__.py -DEFAULT_CONFIG = { - 'system.cpu.time': CPU_FIELDS, - 'system.cpu.utilization': CPU_FIELDS, - 'system.memory.usage': MEMORY_FIELDS, - 'system.memory.utilization': MEMORY_FIELDS, - 'system.swap.usage': ['used', 'free'], - 'system.swap.utilization': ['used', 'free'], - 'system.disk.io': ['read', 'write'], - 'system.disk.operations': ['read', 'write'], - 'system.disk.time': ['read', 'write'], - 'system.network.dropped.packets': ['transmit', 'receive'], - 'system.network.packets': ['transmit', 'receive'], - 'system.network.errors': ['transmit', 'receive'], - 'system.network.io': ['transmit', 'receive'], - 'system.network.connections': ['family', 'type'], - 'system.thread_count': None, - 'process.runtime.memory': ['rss', 'vms'], - 'process.runtime.cpu.time': ['user', 'system'], - 'process.runtime.gc_count': None, -} - - -try: - from opentelemetry.instrumentation.system_metrics import SystemMetricsInstrumentor - - INSTRUMENTOR = SystemMetricsInstrumentor(config=DEFAULT_CONFIG) # type: ignore -except ImportError: # pragma: no cover - INSTRUMENTOR = None # type: ignore - -if sys.platform == 'darwin': # pragma: no cover - # see https://github.com/giampaolo/psutil/issues/1219 - # upstream pr: https://github.com/open-telemetry/opentelemetry-python-contrib/pull/2008 - DEFAULT_CONFIG.pop('system.network.connections') - - -def configure_metrics(meter_provider: MeterProvider) -> None: - if INSTRUMENTOR is None: # pragma: no cover - raise RuntimeError('Install logfire[system-metrics] to use `collect_system_metrics=True`.') - - # we need to call uninstrument() otherwise instrument() will do nothing - # even if the meter provider is different - if INSTRUMENTOR.is_instrumented_by_opentelemetry: - INSTRUMENTOR.uninstrument() # type: ignore - INSTRUMENTOR.instrument(meter_provider=meter_provider) # type: ignore - # The following proxy classes are adapted from OTEL's SDK @dataclasses.dataclass diff --git a/tests/conftest.py b/tests/conftest.py index c65bd242e..1b5420cbb 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -62,7 +62,6 @@ def config_kwargs( id_generator=id_generator, ns_timestamp_generator=time_generator, additional_span_processors=[SimpleSpanProcessor(exporter)], - collect_system_metrics=False, # Ensure that inspect_arguments doesn't break things in most versions # (it's off by default for <3.11) but it's completely forbidden for 3.8. inspect_arguments=sys.version_info[:2] >= (3, 9), diff --git a/tests/otel_integrations/test_system_metrics.py b/tests/otel_integrations/test_system_metrics.py new file mode 100644 index 000000000..a3626015d --- /dev/null +++ b/tests/otel_integrations/test_system_metrics.py @@ -0,0 +1,152 @@ +from __future__ import annotations + +import pytest +from inline_snapshot import snapshot +from opentelemetry.instrumentation.system_metrics import SystemMetricsInstrumentor +from opentelemetry.sdk.metrics.export import InMemoryMetricReader + +import logfire +import logfire._internal.metrics +from logfire._internal.integrations.system_metrics import get_base_config +from tests.test_metrics import get_collected_metrics + + +def get_collected_metric_names(metrics_reader: InMemoryMetricReader) -> list[str]: + try: + return sorted( + { + metric['name'] + for metric in get_collected_metrics(metrics_reader) + if metric['name'] != 'system.network.connections' + } + ) + finally: + SystemMetricsInstrumentor().uninstrument() # type: ignore + + +def test_default_system_metrics_collection(metrics_reader: InMemoryMetricReader) -> None: + logfire.instrument_system_metrics() + assert get_collected_metric_names(metrics_reader) == snapshot( + [ + 'process.runtime.cpython.cpu.utilization', + 'system.cpu.simple_utilization', + 'system.memory.utilization', + 'system.swap.utilization', + ] + ) + + +def test_all_system_metrics_collection(metrics_reader: InMemoryMetricReader) -> None: + logfire.instrument_system_metrics(base='full') + assert get_collected_metric_names(metrics_reader) == snapshot( + [ + 'process.open_file_descriptor.count', + 'process.runtime.cpython.context_switches', + 'process.runtime.cpython.cpu.utilization', + 'process.runtime.cpython.cpu_time', + 'process.runtime.cpython.gc_count', + 'process.runtime.cpython.memory', + 'process.runtime.cpython.thread_count', + 'system.cpu.simple_utilization', + 'system.cpu.time', + 'system.cpu.utilization', + 'system.disk.io', + 'system.disk.operations', + 'system.disk.time', + 'system.memory.usage', + 'system.memory.utilization', + 'system.network.dropped_packets', + 'system.network.errors', + 'system.network.io', + 'system.network.packets', + 'system.swap.usage', + 'system.swap.utilization', + 'system.thread_count', + ] + ) + + +def test_custom_system_metrics_collection(metrics_reader: InMemoryMetricReader) -> None: + logfire.instrument_system_metrics({'system.memory.utilization': ['available']}, base=None) + assert get_collected_metric_names(metrics_reader) == ['system.memory.utilization'] + + +def test_basic_base(): + assert get_base_config('basic') == { + 'process.runtime.cpu.utilization': None, + 'system.cpu.simple_utilization': None, + 'system.memory.utilization': ['available'], + 'system.swap.utilization': ['used'], + }, 'Docs need to be updated if this test fails' + + +def test_full_base(): + config = get_base_config('full') + config.pop('system.network.connections', None) + assert config == { + 'system.cpu.simple_utilization': None, + 'system.cpu.time': ['idle', 'user', 'system', 'irq', 'softirq', 'nice', 'iowait', 'steal', 'interrupt', 'dpc'], + 'system.cpu.utilization': [ + 'idle', + 'user', + 'system', + 'irq', + 'softirq', + 'nice', + 'iowait', + 'steal', + 'interrupt', + 'dpc', + ], + 'system.memory.usage': [ + 'available', + 'used', + 'free', + 'active', + 'inactive', + 'buffers', + 'cached', + 'shared', + 'wired', + 'slab', + 'total', + ], + 'system.memory.utilization': [ + 'available', + 'used', + 'free', + 'active', + 'inactive', + 'buffers', + 'cached', + 'shared', + 'wired', + 'slab', + ], + 'system.swap.usage': ['used', 'free'], + 'system.swap.utilization': ['used'], + 'system.disk.io': ['read', 'write'], + 'system.disk.operations': ['read', 'write'], + 'system.disk.time': ['read', 'write'], + 'system.network.dropped.packets': ['transmit', 'receive'], + 'system.network.packets': ['transmit', 'receive'], + 'system.network.errors': ['transmit', 'receive'], + 'system.network.io': ['transmit', 'receive'], + 'system.thread_count': None, + 'process.runtime.memory': ['rss', 'vms'], + 'process.runtime.cpu.time': ['user', 'system'], + 'process.runtime.gc_count': None, + 'process.runtime.thread_count': None, + 'process.runtime.cpu.utilization': None, + 'process.runtime.context_switches': ['involuntary', 'voluntary'], + 'process.open_file_descriptor.count': None, + }, 'Docs and the MetricName type need to be updated if this test fails' + + +def test_empty_base(): + assert get_base_config(None) == {} + + +def test_invalid_base(): + with pytest.raises(ValueError): + get_base_config('invalid') # type: ignore diff --git a/tests/test_configure.py b/tests/test_configure.py index 23fb6985b..d05a0be88 100644 --- a/tests/test_configure.py +++ b/tests/test_configure.py @@ -11,6 +11,7 @@ from unittest import mock from unittest.mock import call, patch +import inline_snapshot.extra import pytest import requests_mock from inline_snapshot import snapshot @@ -461,7 +462,6 @@ def test_read_config_from_pyproject_toml(tmp_path: Path) -> None: console_colors = "never" console_include_timestamp = false data_dir = "{tmp_path}" - collect_system_metrics = false pydantic_plugin_record = "metrics" pydantic_plugin_include = " test1, test2" pydantic_plugin_exclude = "test3 ,test4" @@ -480,7 +480,6 @@ def test_read_config_from_pyproject_toml(tmp_path: Path) -> None: assert GLOBAL_CONFIG.console.colors == 'never' assert GLOBAL_CONFIG.console.include_timestamps is False assert GLOBAL_CONFIG.data_dir == tmp_path - assert GLOBAL_CONFIG.collect_system_metrics is False assert GLOBAL_CONFIG.pydantic_plugin.record == 'metrics' assert GLOBAL_CONFIG.pydantic_plugin.include == {'test1', 'test2'} assert GLOBAL_CONFIG.pydantic_plugin.exclude == {'test3', 'test4'} @@ -553,7 +552,6 @@ def default_span_processor(exporter: SpanExporter) -> SimpleSpanProcessor: token='abc1', default_span_processor=default_span_processor, additional_metric_readers=[InMemoryMetricReader()], - collect_system_metrics=False, ) wait_for_check_token_thread() @@ -583,7 +581,6 @@ def test_configure_service_version(tmp_path: str) -> None: token='abc2', service_version='1.2.3', additional_metric_readers=[InMemoryMetricReader()], - collect_system_metrics=False, ) assert GLOBAL_CONFIG.service_version == '1.2.3' @@ -591,7 +588,6 @@ def test_configure_service_version(tmp_path: str) -> None: configure( token='abc3', additional_metric_readers=[InMemoryMetricReader()], - collect_system_metrics=False, ) assert GLOBAL_CONFIG.service_version == git_sha @@ -603,7 +599,6 @@ def test_configure_service_version(tmp_path: str) -> None: configure( token='abc4', additional_metric_readers=[InMemoryMetricReader()], - collect_system_metrics=False, ) assert GLOBAL_CONFIG.service_version is None finally: @@ -866,7 +861,7 @@ def test_initialize_project_use_existing_project_no_projects(tmp_dir_cwd: Path, } request_mocker.post('https://logfire-api.pydantic.dev/v1/projects/fake_org', [create_project_response]) - logfire.configure(send_to_logfire=True, collect_system_metrics=False) + logfire.configure(send_to_logfire=True) assert confirm_mock.mock_calls == [ call('The project will be created in the organization "fake_org". Continue?', default=True), @@ -901,7 +896,7 @@ def test_initialize_project_use_existing_project(tmp_dir_cwd: Path, tmp_path: Pa [create_project_response], ) - logfire.configure(send_to_logfire=True, collect_system_metrics=False) + logfire.configure(send_to_logfire=True) assert confirm_mock.mock_calls == [ call('Do you want to use one of your existing projects? ', default=True), @@ -960,7 +955,6 @@ def test_initialize_project_not_using_existing_project( logfire.configure( send_to_logfire=True, - collect_system_metrics=False, ) assert confirm_mock.mock_calls == [ @@ -1001,7 +995,7 @@ def test_initialize_project_not_confirming_organization(tmp_path: Path) -> None: ) with pytest.raises(SystemExit): - logfire.configure(data_dir=tmp_path, send_to_logfire=True, collect_system_metrics=False) + logfire.configure(data_dir=tmp_path, send_to_logfire=True) assert confirm_mock.mock_calls == [ call('Do you want to use one of your existing projects? ', default=True), @@ -1078,7 +1072,7 @@ def test_initialize_project_create_project(tmp_dir_cwd: Path, tmp_path: Path, ca ], ) - logfire.configure(send_to_logfire=True, collect_system_metrics=False) + logfire.configure(send_to_logfire=True) for request in request_mocker.request_history: assert request.headers['Authorization'] == 'fake_user_token' @@ -1161,7 +1155,7 @@ def test_initialize_project_create_project_default_organization(tmp_dir_cwd: Pat [create_project_response], ) - logfire.configure(send_to_logfire=True, collect_system_metrics=False) + logfire.configure(send_to_logfire=True) assert prompt_mock.mock_calls == [ call( @@ -1193,7 +1187,7 @@ def test_send_to_logfire_true(tmp_path: Path) -> None: ) ) with pytest.raises(RuntimeError, match='^expected$'): - configure(send_to_logfire=True, console=False, data_dir=data_dir, collect_system_metrics=False) + configure(send_to_logfire=True, console=False, data_dir=data_dir) def test_send_to_logfire_false() -> None: @@ -1340,7 +1334,7 @@ def test_configure_fstring_python_38(): def test_default_exporters(monkeypatch: pytest.MonkeyPatch): monkeypatch.setattr(LogfireConfig, '_initialize_credentials_from_token', lambda *args: None) # type: ignore - logfire.configure(send_to_logfire=True, token='foo', collect_system_metrics=False) + logfire.configure(send_to_logfire=True, token='foo') [console_processor, send_to_logfire_processor, pending_span_processor] = get_span_processors() @@ -1382,7 +1376,7 @@ def test_custom_exporters(): def test_otel_exporter_otlp_endpoint_env_var(): # Setting this env var creates an OTLPSpanExporter and an OTLPMetricExporter with patch.dict(os.environ, {'OTEL_EXPORTER_OTLP_ENDPOINT': 'otel_endpoint'}): - logfire.configure(send_to_logfire=False, console=False, collect_system_metrics=False) + logfire.configure(send_to_logfire=False, console=False) [otel_processor] = get_span_processors() assert isinstance(otel_processor, MainSpanProcessorWrapper) @@ -1399,7 +1393,7 @@ def test_otel_exporter_otlp_endpoint_env_var(): def test_otel_traces_exporter_env_var(): # Setting OTEL_TRACES_EXPORTER to something other than otlp prevents creating an OTLPSpanExporter with patch.dict(os.environ, {'OTEL_EXPORTER_OTLP_ENDPOINT': 'otel_endpoint2', 'OTEL_TRACES_EXPORTER': 'grpc'}): - logfire.configure(send_to_logfire=False, console=False, collect_system_metrics=False) + logfire.configure(send_to_logfire=False, console=False) assert len(list(get_span_processors())) == 0 @@ -1440,7 +1434,7 @@ def test_otel_exporter_otlp_traces_endpoint_env_var(): def test_otel_exporter_otlp_metrics_endpoint_env_var(): # Setting just OTEL_EXPORTER_OTLP_METRICS_ENDPOINT only creates an OTLPMetricExporter with patch.dict(os.environ, {'OTEL_EXPORTER_OTLP_METRICS_ENDPOINT': 'otel_metrics_endpoint'}): - logfire.configure(send_to_logfire=False, console=False, collect_system_metrics=False) + logfire.configure(send_to_logfire=False, console=False) assert len(list(get_span_processors())) == 0 @@ -1456,3 +1450,23 @@ def get_span_processors() -> Iterable[SpanProcessor]: def get_metric_readers() -> Iterable[SpanProcessor]: return get_meter_provider().provider._sdk_config.metric_readers # type: ignore + + +def test_collect_system_metrics_false(): + with inline_snapshot.extra.raises( + snapshot( + 'ValueError: The `collect_system_metrics` argument has been removed. ' + 'System metrics are no longer collected by default.' + ) + ): + logfire.configure(collect_system_metrics=False) # type: ignore + + +def test_collect_system_metrics_true(): + with inline_snapshot.extra.raises( + snapshot( + 'ValueError: The `collect_system_metrics` argument has been removed. ' + 'Use `logfire.instrument_system_metrics()` instead.' + ) + ): + logfire.configure(collect_system_metrics=True) # type: ignore diff --git a/tests/test_metrics.py b/tests/test_metrics.py index e62050c6c..4fd8565e0 100644 --- a/tests/test_metrics.py +++ b/tests/test_metrics.py @@ -16,28 +16,6 @@ from logfire._internal.exporters.quiet_metrics import QuietMetricExporter -def test_system_metrics_collection() -> None: - metrics_reader = InMemoryMetricReader() - logfire.configure( - send_to_logfire=False, - additional_metric_readers=[metrics_reader], - # i.e. use the default value, in contrast to `False` which the automatic test fixture uses. - collect_system_metrics=None, - ) - metrics_collected = {metric['name'] for metric in get_collected_metrics(metrics_reader)} - - # collected metrics vary by platform, etc. - # assert that we at least collected _some_ of the metrics we expect - assert metrics_collected.issuperset( - { - 'system.swap.usage', - 'system.disk.operations', - 'system.memory.usage', - 'system.cpu.utilization', - } - ), metrics_collected - - def test_create_metric_counter(metrics_reader: InMemoryMetricReader) -> None: counter = logfire.metric_counter('counter') counter.add(1) @@ -327,8 +305,7 @@ def observable_counter(options: CallbackOptions): def get_collected_metrics(metrics_reader: InMemoryMetricReader) -> list[dict[str, Any]]: exported_metrics = json.loads(cast(MetricsData, metrics_reader.get_metrics_data()).to_json()) # type: ignore [resource_metric] = exported_metrics['resource_metrics'] - [scope_metric] = resource_metric['scope_metrics'] - return scope_metric['metrics'] + return [metric for scope_metric in resource_metric['scope_metrics'] for metric in scope_metric['metrics']] def test_quiet_metric_exporter(caplog: pytest.LogCaptureFixture) -> None: diff --git a/tests/test_secret_scrubbing.py b/tests/test_secret_scrubbing.py index 16b73b7d7..923fc8db5 100644 --- a/tests/test_secret_scrubbing.py +++ b/tests/test_secret_scrubbing.py @@ -231,7 +231,6 @@ def callback(match: logfire.ScrubMatch): id_generator=id_generator, ns_timestamp_generator=time_generator, additional_span_processors=[SimpleSpanProcessor(exporter)], - collect_system_metrics=False, ) # Note the values (or lack thereof) of each of these attributes in the exported span. @@ -279,7 +278,6 @@ def test_dont_scrub_resource( id_generator=id_generator, ns_timestamp_generator=time_generator, additional_span_processors=[SimpleSpanProcessor(exporter)], - collect_system_metrics=False, ) logfire.info('hi') assert dict(exporter.exported_spans[0].resource.attributes) == IsPartialDict(