renfinlayson/agent-framework

Fork 0

Files

Ren Finlayson 539852f81c

CodeQL / Analyze (csharp) (push) Waiting to run

Details

CodeQL / Analyze (python) (push) Waiting to run

Details

dotnet-build-and-test / paths-filter (push) Waiting to run

Details

dotnet-build-and-test / dotnet-build-and-test (Debug, windows-latest, net9.0) (push) Blocked by required conditions

Details

dotnet-build-and-test / dotnet-build-and-test (Release, integration, true, ubuntu-latest, net10.0) (push) Blocked by required conditions

Details

dotnet-build-and-test / dotnet-build-and-test (Release, integration, true, windows-latest, net472) (push) Blocked by required conditions

Details

dotnet-build-and-test / dotnet-build-and-test (Release, ubuntu-latest, net8.0) (push) Blocked by required conditions

Details

dotnet-build-and-test / dotnet-build-and-test-check (push) Blocked by required conditions

Details

test

2026-01-24 03:05:12 +11:00

6.7 KiB

Raw Blame History

status, contact, date, deciders, informed

status

contact

date

deciders

informed

proposed

rogerbarreto

2025-07-14

stephentoub, markwallace-microsoft, rogerbarreto, westey-m

Agent OpenTelemetry Instrumentation

Context and Problem Statement

Currently, the Agent Framework lacks comprehensive observability and telemetry capabilities, making it difficult for developers to monitor agent performance, track usage patterns, debug issues, and gain insights into agent behavior in production environments. While the underlying ChatClient implementations may have their own telemetry, there is no standardized way to capture agent-specific metrics and traces that provide visibility into agent operations, token usage, response times, and error patterns at the agent abstraction level.

Decision Drivers

Compliance: The implementation should adhere to established OpenTelemetry semantic conventions for agents, ensuring consistency and interoperability with existing telemetry systems.
Observability Requirements: Developers need comprehensive telemetry to monitor agent performance, track usage patterns, and debug issues in production environments.
Standardization: The solution must follow established OpenTelemetry semantic conventions and integrate seamlessly with existing .NET telemetry infrastructure.
Microsoft.Extensions.AI Alignment: The implementation should follow the exact patterns and conventions established by Microsoft.Extensions.AI's OpenTelemetry instrumentation.
Non-Intrusive Design: Telemetry should be optional and not impact the core agent functionality or performance when disabled.
Agent-Level Insights: The telemetry should capture agent-specific operations without duplicating underlying ChatClient telemetry.
Extensibility: The solution should support future enhancements and additional telemetry scenarios.

Considered Options

Option 1: Direct Integration into Core Agent Classes

Embed OpenTelemetry instrumentation directly into the base Agent class and ChatClientAgent implementations.

Pros

Automatic telemetry for all agent implementations
No additional wrapper classes needed
Consistent telemetry across all agents

Cons

Violates single responsibility principle
Increases complexity of core agent classes
Makes telemetry mandatory rather than optional
Harder to test and maintain
Couples telemetry concerns with business logic

Option 2: Aspect-Oriented Programming (AOP) Approach

Use interceptors or AOP frameworks to inject telemetry behavior into agent methods.

Pros

Clean separation of concerns
Non-intrusive to existing code
Can be applied selectively

Cons

Adds complexity with AOP framework dependencies
Runtime overhead for interception
Harder to debug and understand
Not consistent with Microsoft.Extensions.AI patterns

Option 3: OpenTelemetryAgent Wrapper Pattern

Create a delegating OpenTelemetryAgent wrapper class that implements the Agent interface and wraps any existing agent with telemetry instrumentation, following the exact pattern of Microsoft.Extensions.AI's OpenTelemetryChatClient.

Pros

Follows established Microsoft.Extensions.AI patterns exactly
Clean separation of concerns
Optional and non-intrusive
Easy to test and maintain
Consistent with .NET telemetry conventions
Supports any agent implementation
Provides agent-level telemetry without duplicating ChatClient telemetry

Cons

Requires explicit wrapping of agents
Additional object allocation for wrapper

Decision Outcome

Chosen option: "OpenTelemetryAgent Wrapper Pattern", because it follows the established Microsoft.Extensions.AI patterns exactly, provides clean separation of concerns, maintains optional telemetry, and offers the best balance of functionality, maintainability, and consistency with existing .NET telemetry infrastructure.

Implementation Details

The implementation includes:

OpenTelemetryAgent Wrapper Class: A delegating agent that wraps any Agent implementation with telemetry instrumentation
AgentOpenTelemetryConsts: Comprehensive constants for telemetry attribute names and metric definitions
Extension Methods: .WithOpenTelemetry() extension method for easy agent wrapping
Comprehensive Test Suite: Full test coverage following Microsoft.Extensions.AI testing patterns

Telemetry Data Captured

Activities/Spans:

agent.operation.name (agent.run, agent.run_streaming)
agent.request.id, agent.request.name, agent.request.instructions
agent.request.message_count, agent.request.thread_id
agent.response.id, agent.response.message_count, agent.response.finish_reason
agent.usage.input_tokens, agent.usage.output_tokens
Error information and activity status codes

Metrics:

Operation duration histogram with proper buckets
Token usage histogram (input/output tokens)
Request count counter
All metrics tagged with operation type and agent name

Consequences

Good: Provides comprehensive agent-level observability following established patterns
Good: Non-intrusive and optional implementation that doesn't affect core functionality
Good: Consistent with Microsoft.Extensions.AI telemetry conventions
Good: Easy to integrate with existing OpenTelemetry infrastructure
Good: Supports debugging, monitoring, and performance analysis
Neutral: Requires explicit wrapping of agents with .WithOpenTelemetry()
Neutral: Additional object allocation for telemetry wrapper

Validation

The implementation is validated through:

Comprehensive Unit Tests: 16 test methods covering all scenarios including success, error, streaming, and edge cases
Integration Testing: Step05 telemetry sample demonstrating real-world usage
Pattern Compliance: Exact adherence to Microsoft.Extensions.AI OpenTelemetry patterns
Semantic Convention Compliance: Follows OpenTelemetry semantic conventions for telemetry data

More Information

Usage Example

// Create TracerProvider
using var tracerProvider = Sdk.CreateTracerProviderBuilder()
    .AddSource(AgentOpenTelemetryConsts.DefaultSourceName)
    .AddConsoleExporter()
    .Build();

// Create and wrap agent with telemetry
var baseAgent = new ChatClientAgent(chatClient, options);
using var telemetryAgent = baseAgent.WithOpenTelemetry();

// Use agent normally - telemetry is captured automatically
var response = await telemetryAgent.RunAsync(messages);

Relationship to Microsoft.Extensions.AI

This implementation follows the exact patterns established by Microsoft.Extensions.AI's OpenTelemetry instrumentation, ensuring consistency across the AI ecosystem and leveraging proven patterns for telemetry integration.

6.7 KiB Raw Blame History

Agent OpenTelemetry Instrumentation

Context and Problem Statement

Decision Drivers

Considered Options

Option 1: Direct Integration into Core Agent Classes

Pros

Cons

Option 2: Aspect-Oriented Programming (AOP) Approach

Pros

Cons

Option 3: OpenTelemetryAgent Wrapper Pattern

Pros

Cons

Decision Outcome

Implementation Details

Telemetry Data Captured

Consequences

Validation

More Information

Usage Example

Relationship to Microsoft.Extensions.AI

6.7 KiB

Raw Blame History