Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Sep 2, 2025

Description

This PR fixes an issue where caching information was not being reported when using the OpenAI-Native provider with the Responses API.

Problem

The OpenAI-Native provider was not properly extracting and reporting cache token usage from the Responses API, which meant users couldn't see when their requests were using cached content or how many tokens were being cached.

Solution

Added a normalizeUsage method that properly extracts cache-related token information from the Responses API response, with support for both detailed token shapes and legacy field names for compatibility.

Key Changes:

  • Added normalizeUsage method to extract and normalize usage data from various response formats
  • Support for detailed token shapes (input_tokens_details, output_tokens_details) when available
  • Proper fallback handling for legacy field names to ensure compatibility across different API versions and transport methods
  • Calculate cache read/write tokens with appropriate fallbacks
  • Include reasoning tokens when available in the output details
  • Ensure accurate cost calculation using uncached input tokens to avoid double-counting

Why Fallbacks Are Needed

Even though we exclusively use the Responses API, fallbacks are necessary because:

  1. SDK vs SSE transport differences: When the SDK fails or returns non-AsyncIterable responses, we fall back to raw SSE which may use different field names
  2. Proxy/gateway variations: Custom base URLs may point to proxies that transform field names
  3. Model/rollout differences: Different models or API versions may omit detailed token info
  4. No-cache responses: Responses without cache usage may omit the detailed blocks entirely

Testing

  • The changes maintain backward compatibility with existing usage tracking
  • Properly handles responses with and without cache usage
  • Correctly calculates costs based on cached vs uncached tokens

Impact

Users will now be able to see:

  • How many tokens were read from cache
  • How many tokens were written to cache
  • Accurate cost calculations that account for cached content
  • Reasoning tokens when using models that support them

Fixes the unreported issue with OpenAI-Native provider cache reporting.


Important

Adds cache reporting support for OpenAI-Native provider by implementing normalizeUsage method to handle various response formats and cache-related fields.

  • Behavior:
    • Adds normalizeUsage method in openai-native.ts to extract and normalize cache-related token information from Responses API.
    • Supports detailed token shapes (input_tokens_details, output_tokens_details) and legacy field names.
    • Calculates cache read/write tokens and includes reasoning tokens when available.
    • Ensures accurate cost calculation using uncached input tokens.
  • Testing:
    • Adds tests in openai-native-usage.spec.ts to verify handling of detailed token shapes, legacy fields, SSE events, and edge cases.
    • Tests cost calculation with and without cache reads.
  • Impact:
    • Users can see cache read/write tokens and accurate cost calculations, including reasoning tokens when supported.

This description was created by Ellipsis for 3e073f3. You can customize this summary. It will automatically update as commits are pushed.

- Add normalizeUsage method to properly extract cache tokens from Responses API
- Support both detailed token shapes (input_tokens_details) and legacy fields
- Calculate cache read/write tokens with proper fallbacks
- Include reasoning tokens when available in output_tokens_details
- Ensure accurate cost calculation using uncached input tokens

This fixes the issue where caching information was not being reported
when using the OpenAI-Native provider with the Responses API.
Copilot AI review requested due to automatic review settings September 2, 2025 20:41
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. bug Something isn't working labels Sep 2, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes cache reporting functionality for the OpenAI-Native provider by implementing proper extraction and normalization of cache-related token usage from the Responses API.

  • Added normalizeUsage method to properly extract cache token information from various response formats
  • Enhanced fallback handling for compatibility across different API versions and transport methods
  • Improved cost calculation to use uncached input tokens and avoid double-counting

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I've reviewed the changes and found that the implementation correctly addresses the cache reporting issue. The fallback patterns and backward compatibility are well handled. I have a few suggestions inline that could improve the implementation.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Sep 2, 2025
- Add fallback to derive total input tokens from details when totals are missing
- Remove unused convertToOpenAiMessages import
- Add comment explaining cost calculation alignment with Gemini provider
- Add comprehensive test coverage for normalizeUsage method covering:
  - Detailed token shapes with cached/miss tokens
  - Legacy field names and SSE-only events
  - Edge cases including missing totals with details-only
  - Cost calculation with uncached input tokens
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Sep 2, 2025
- Remove incorrect fallback to missFromDetails for cache write tokens
- Fix cost calculation to pass total input tokens (calculateApiCostOpenAI handles subtraction)
- Improve readability by extracting cache detail checks to intermediate variables
- Remove redundant ?? undefined
- Update tests to reflect correct behavior (miss tokens are not cache writes)
- Add clarifying comments about cache miss vs cache write tokens
@hannesrudolph hannesrudolph moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Sep 2, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Sep 2, 2025
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 2, 2025
@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Needs Review] in Roo Code Roadmap Sep 2, 2025
@mrubens mrubens merged commit d1baa6e into main Sep 2, 2025
19 checks passed
@mrubens mrubens deleted the fix/openai-native-cache-reporting branch September 2, 2025 23:59
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer PR - Needs Review size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants