Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions articles/ai-foundry/openai/how-to/latency.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,9 @@ While prompt size has smaller influence on latency than the generation size it a
### Batching
If you're sending multiple requests to the same endpoint, you can batch the requests into a single call. This reduces the number of requests you need to make and depending on the scenario it might improve overall response time. We recommend testing this method to see if it helps.

### Appropriate Client Timeout Values
Start by considering a setting of 15 or 30 seconds for your client timeout. In some cases, setting an even longer client timeout may help stabilize your solution.

## How to measure your throughput
We recommend measuring your overall throughput on a deployment with two measures:
- Calls per minute: The number of API inference calls you're making per minute. This can be measured in Azure-monitor using the Azure OpenAI Requests metric and splitting by the ModelDeploymentName
Expand Down