Skip to content

Conversation

skamenan7
Copy link
Contributor

@skamenan7 skamenan7 commented Sep 19, 2025

What does this PR do?

Adds workflow metrics tracking to the agent system to monitor performance and usage patterns. The implementation
tracks step execution, workflow completion times, and tool usage with proper telemetry integration.

The metrics provide visibility into agent behavior and can be queried using the telemetry system. Tool names are
normalized for consistency (knowledge_search becomes rag).

Closes #2602

Test Plan

Integration Test:

LLAMA_STACK_CONFIG="http://localhost:8321" python -m pytest tests/integration/agents/test_agent_metrics_integratio
n.py::TestAgentMetricsIntegration::test_agent_metrics_end_to_end -v

validating:

  • Agent workflows generate the expected metrics (steps, tools, duration)
  • Tool calls are tracked with normalized names
  • Metrics can be queried via telemetry.query_metrics()
  • Both web_search and knowledge_search tools appear in results

Verifies metrics are properly collected and queryable, building on query functionality from #3074.

Please Note: Most of the code was reviewed in #2993 but we wanted to test the metrics using query_metrics from #3074 and I was focusing more on higher priory items. So I created this PR for easier reviews as commits were many and was not easy to follow.

Add comprehensive OpenTelemetry-based metrics for agent observability:

- Workflow completion/failure tracking with duration measurements
- Step execution counters for performance monitoring
- Tool usage tracking with normalized tool names
- Non-blocking telemetry emission with named async tasks
- Comprehensive unit and integration test coverage
- Graceful handling when telemetry is disabled
- simplified test to use telemetry.query_metrics for verification
- test now validates actual queryable metrics data
- verified by query metrics functionality added in llamastack#3074
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 19, 2025
@skamenan7 skamenan7 marked this pull request as ready for review September 19, 2025 15:11
@skamenan7
Copy link
Contributor Author

cc: @cdoern please review as you had developed query metrics and we thought of using that to test the metrics. Thanks!

@skamenan7 skamenan7 changed the title Feature/2602 agent workflow metrics 2 feat: 2602 agent workflow metrics 2 Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add additional Agent workflow metrics
1 participant