Implementation Details
Source:~/workspace/source/clients/agent-runtime/src/providers/openai.rs:10
The OpenAI provider implements:
- ✅ Native tool calling with function definitions
- ✅ Streaming responses
- ✅ Reasoning content fallback for o1/o3 models
- ✅ Multi-turn conversations
- ✅ Connection warmup for reduced latency
Configuration
Basic Setup
In~/.config/corvus/config.toml:
API Key Setup
Set your OpenAI API key via environment variable:~/workspace/source/clients/agent-runtime/src/providers/mod.rs:301):
- Explicit
api_keyparameter (trimmed) OPENAI_API_KEYenvironment variableCORVUS_API_KEYfallbackAPI_KEYfallback
Supported Models
GPT-4o Series
Best for general-purpose tasks with multimodal capabilities:- Native tool calling
- Vision (image understanding)
- 128K context window
- Up to 16K output tokens
o1 Series (Reasoning Models)
Advanced reasoning with chain-of-thought:- Uses
reasoning_contentfield for internal reasoning - Automatically falls back if
contentis empty - No streaming support
- Higher token costs
o3-mini (Latest Reasoning Model)
- Superior reasoning capabilities
- Cost-effective compared to o1
- Good for complex problem-solving
GPT-4 Turbo
GPT-3.5 Turbo
Usage Examples
Simple Chat
Chat with System Prompt
Multi-turn Conversation
Tool Calling
The OpenAI provider supports native function calling:Reasoning Models (o1/o3)
Reasoning models automatically use thereasoning_content field:
Advanced Configuration
Connection Warmup
Reduce first-request latency by warming up the connection:Custom Timeouts
The provider uses these defaults:- Request timeout: 120 seconds
- Connect timeout: 10 seconds
~/workspace/source/clients/agent-runtime/src/providers/openai.rs:141.
With Resilient Provider Chain
Automatic failover to backup providers:Error Handling
Common errors and solutions:Missing API Key
OPENAI_API_KEY environment variable.
Rate Limiting
- Implement exponential backoff (automatic with
create_resilient_provider) - Use fallback providers
- Upgrade to higher rate limits
Invalid Model
Token Limit Exceeded
- Reduce conversation history
- Use a model with larger context (e.g.,
gpt-4ohas 128K) - Implement conversation summarization
Best Practices
- Use environment variables for API keys, never hardcode
- Call
warmup()during initialization for faster first requests - Use
gpt-4o-minifor cost-effective tasks - Use reasoning models (o1/o3) for complex problem-solving
- Enable fallback providers for production reliability
- Set appropriate temperature:
0.0-0.3for factual/deterministic tasks0.7(default) for balanced responses1.0+for creative/reasoning tasks
- Monitor token usage to control costs
- Implement retry logic with exponential backoff
- Use streaming for real-time user feedback (when available)
- Leverage tool calling for agent workflows
Cost Optimization
Model Selection
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Use Case |
|---|---|---|---|
| gpt-4o | $2.50 | $10.00 | General-purpose, balanced |
| gpt-4o-mini | $0.15 | $0.60 | High-volume, cost-sensitive |
| o1-preview | $15.00 | $60.00 | Complex reasoning |
| o1-mini | $3.00 | $12.00 | Cost-effective reasoning |
| o3-mini | $1.10 | $4.40 | Latest reasoning model |
| gpt-3.5-turbo | $0.50 | $1.50 | Simple tasks, legacy |
Tips
- Use
gpt-4o-miniby default, upgrade only when needed - Implement prompt caching (when available)
- Reduce system prompt length where possible
- Trim conversation history to essential messages
- Use function calling to reduce verbose output
- Set
max_tokenslimits in production