Pricing
Understand Oru-el's per-token pricing model, wallet system, and cost controls.
Pricing#
Oru-el uses a straightforward pay-per-use pricing model. You pre-load a wallet with funds, and usage is deducted automatically as you make API calls.
How pricing works#
Inference API#
Text generation models are billed per token with separate rates for input and output:
- Input tokens — the tokens in your prompt (system message, user messages, conversation history)
- Output tokens — the tokens generated by the model in its response
Prices are expressed as cost per million tokens. For example, if a model charges $0.10 per million input tokens and $0.30 per million output tokens, a request with 1,000 input tokens and 500 output tokens would cost:
Input: (1,000 / 1,000,000) × $0.10 = $0.0001
Output: (500 / 1,000,000) × $0.30 = $0.00015
Total: $0.00025
Other model types#
| Model type | Billing unit |
|---|---|
| Text generation | Per million tokens (input + output) |
| Embeddings | Per million tokens (input) |
| Image generation | Per image |
| Text-to-speech | Per request |
Cached input tokens#
Some models support prompt caching. When part of your input matches a previously cached prefix, those tokens are billed at a reduced cached input rate. This happens automatically — no configuration needed.
Pricing tiers#
Each model is available in one or both tiers:
| Tier | Description | When to use |
|---|---|---|
| Standard | Default tier. Cost-optimized routing. | Most workloads, development, batch processing |
| Turbo | Low-latency optimized routing. | Real-time applications, chatbots, user-facing features |
Turbo pricing is typically 1.5-3x the standard rate, depending on the model. Not all models have a turbo tier available.
To use turbo, pass tier: "turbo" in your request:
response = client.chat.completions.create(
model="llama-4-maverick",
messages=[{"role": "user", "content": "Hello!"}],
extra_body={"tier": "turbo"},
)
Wallet system#
How the wallet works#
Your wallet holds a pre-paid USD balance. Every API call deducts the cost from your balance in real time.
- Top up your wallet from Settings > Billing
- View your balance in the dashboard header or the Wallet page
- Transaction history shows every charge and top-up with details
Minimum balance#
You need a positive wallet balance to make API calls. If your balance is too low to cover the estimated cost of a request, the API returns an error:
{
"error": {
"code": "VALIDATION_ERROR",
"message": "Insufficient wallet balance ($0.12). Estimated cost: ~$0.0500. Please top up."
}
}
Checking your balance#
View your current balance and transaction history at any time from the Wallet page in the dashboard.
Checking model pricing#
In the dashboard#
Browse the Model Catalog to see pricing for every model. Each model card shows input and output rates for both standard and turbo tiers.
Via the API#
curl https://api.oru-el.com/v1/inference/models
The response includes pricing for each model:
{
"models": [
{
"id": "llama-4-maverick",
"displayName": "Llama 4 Maverick",
"pricing": {
"standard": {
"inputPerMToken": "0.10",
"outputPerMToken": "0.30",
"cachedInputPerMToken": "0.05"
},
"turbo": {
"inputPerMToken": "0.20",
"outputPerMToken": "0.60",
"cachedInputPerMToken": null
}
}
}
]
}
Cost calculation example#
Suppose you're using a model priced at $0.50/M input tokens and $1.50/M output tokens, and your request uses 2,000 input tokens and generates 800 output tokens:
| Component | Tokens | Rate (per M) | Cost |
|---|---|---|---|
| Input tokens | 2,000 | $0.50 | $0.001000 |
| Output tokens | 800 | $1.50 | $0.001200 |
| Total | $0.002200 |
For cached inputs, suppose 1,500 of those 2,000 input tokens hit the cache at $0.25/M:
| Component | Tokens | Rate (per M) | Cost |
|---|---|---|---|
| Non-cached input | 500 | $0.50 | $0.000250 |
| Cached input | 1,500 | $0.25 | $0.000375 |
| Output tokens | 800 | $1.50 | $0.001200 |
| Total | $0.001825 |
Budget controls#
Oru-el provides built-in budget enforcement to prevent unexpected spending.
Monthly budgets#
Set a maximum monthly spend for inference. When you hit the limit:
- Hard limit — API calls are rejected with a
429status code - Soft limit — you receive alerts but calls continue
{
"error": "BUDGET_EXCEEDED",
"message": "Monthly inference budget of $50.00 exceeded (current: $50.12)",
"currentSpend": 50.12,
"limit": 50,
"category": "inference"
}
Hourly rate limits#
Set a maximum cost per hour to catch runaway loops or misconfigured applications:
{
"error": "RATE_LIMIT_EXCEEDED",
"message": "Hourly cost limit of $5.00 exceeded",
"currentHourlyCost": 5.23,
"limit": 5
}
Configure budgets from Analytics > Budgets in the dashboard.
GPU compute pricing#
GPU machines are billed per hour based on the GPU type and configuration.
| Resource | Billing |
|---|---|
| GPU instance | Per hour while the machine is running |
| Storage | Per GB per month |
| Network | Included |
Pricing varies by GPU type (A100, H100, etc.) and is shown in the GPU marketplace when you create a machine. The clock starts when the machine is provisioned and stops when you terminate it.
View GPU pricing at GPUs in the dashboard or on the public Pricing page.