Pricing

Understand Oru-el's per-token pricing model, wallet system, and cost controls.

Pricing#

Oru-el uses a straightforward pay-per-use pricing model. You pre-load a wallet with funds, and usage is deducted automatically as you make API calls.

How pricing works#

Inference API#

Text generation models are billed per token with separate rates for input and output:

Input tokens — the tokens in your prompt (system message, user messages, conversation history)
Output tokens — the tokens generated by the model in its response

Prices are expressed as cost per million tokens. For example, if a model charges $0.10 per million input tokens and $0.30 per million output tokens, a request with 1,000 input tokens and 500 output tokens would cost:

Input:  (1,000 / 1,000,000) × $0.10 = $0.0001
Output: (500 / 1,000,000)   × $0.30 = $0.00015
Total:                                 $0.00025

Other model types#

Model type	Billing unit
Text generation	Per million tokens (input + output)
Embeddings	Per million tokens (input)
Image generation	Per image
Text-to-speech	Per request

Cached input tokens#

Some models support prompt caching. When part of your input matches a previously cached prefix, those tokens are billed at a reduced cached input rate. This happens automatically — no configuration needed.

Pricing tiers#

Each model is available in one or both tiers:

Tier	Description	When to use
Standard	Default tier. Cost-optimized routing.	Most workloads, development, batch processing
Turbo	Low-latency optimized routing.	Real-time applications, chatbots, user-facing features

Turbo pricing is typically 1.5-3x the standard rate, depending on the model. Not all models have a turbo tier available.

To use turbo, pass tier: "turbo" in your request:

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"tier": "turbo"},
)

Wallet system#

How the wallet works#

Your wallet holds a pre-paid USD balance. Every API call deducts the cost from your balance in real time.

Top up your wallet from Settings > Billing
View your balance in the dashboard header or the Wallet page
Transaction history shows every charge and top-up with details

Minimum balance#

You need a positive wallet balance to make API calls. If your balance is too low to cover the estimated cost of a request, the API returns an error:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Insufficient wallet balance ($0.12). Estimated cost: ~$0.0500. Please top up."
  }
}

Checking your balance#

View your current balance and transaction history at any time from the Wallet page in the dashboard.

Checking model pricing#

In the dashboard#

Browse the Model Catalog to see pricing for every model. Each model card shows input and output rates for both standard and turbo tiers.

Via the API#

curl https://api.oru-el.com/v1/inference/models

The response includes pricing for each model:

{
  "models": [
    {
      "id": "llama-4-maverick",
      "displayName": "Llama 4 Maverick",
      "pricing": {
        "standard": {
          "inputPerMToken": "0.10",
          "outputPerMToken": "0.30",
          "cachedInputPerMToken": "0.05"
        },
        "turbo": {
          "inputPerMToken": "0.20",
          "outputPerMToken": "0.60",
          "cachedInputPerMToken": null
        }
      }
    }
  ]
}

Cost calculation example#

Suppose you're using a model priced at $0.50/M input tokens and $1.50/M output tokens, and your request uses 2,000 input tokens and generates 800 output tokens:

Component	Tokens	Rate (per M)	Cost
Input tokens	2,000	$0.50	$0.001000
Output tokens	800	$1.50	$0.001200
Total			$0.002200

For cached inputs, suppose 1,500 of those 2,000 input tokens hit the cache at $0.25/M:

Component	Tokens	Rate (per M)	Cost
Non-cached input	500	$0.50	$0.000250
Cached input	1,500	$0.25	$0.000375
Output tokens	800	$1.50	$0.001200
Total			$0.001825

Budget controls#

Oru-el provides built-in budget enforcement to prevent unexpected spending.

Monthly budgets#

Set a maximum monthly spend for inference. When you hit the limit:

Hard limit — API calls are rejected with a 429 status code
Soft limit — you receive alerts but calls continue

{
  "error": "BUDGET_EXCEEDED",
  "message": "Monthly inference budget of $50.00 exceeded (current: $50.12)",
  "currentSpend": 50.12,
  "limit": 50,
  "category": "inference"
}

Hourly rate limits#

Set a maximum cost per hour to catch runaway loops or misconfigured applications:

{
  "error": "RATE_LIMIT_EXCEEDED",
  "message": "Hourly cost limit of $5.00 exceeded",
  "currentHourlyCost": 5.23,
  "limit": 5
}

Configure budgets from Analytics > Budgets in the dashboard.

GPU compute pricing#

GPU machines are billed per hour based on the GPU type and configuration.

Resource	Billing
GPU instance	Per hour while the machine is running
Storage	Per GB per month
Network	Included

Pricing varies by GPU type (A100, H100, etc.) and is shown in the GPU marketplace when you create a machine. The clock starts when the machine is provisioned and stops when you terminate it.

View GPU pricing at GPUs in the dashboard or on the public Pricing page.