Pricing

Understand Oru-el's per-token pricing model, wallet system, and cost controls.

Pricing#

Oru-el uses a straightforward pay-per-use pricing model. You pre-load a wallet with funds, and usage is deducted automatically as you make API calls.

How pricing works#

Inference API#

Text generation models are billed per token with separate rates for input and output:

  • Input tokens — the tokens in your prompt (system message, user messages, conversation history)
  • Output tokens — the tokens generated by the model in its response

Prices are expressed as cost per million tokens. For example, if a model charges $0.10 per million input tokens and $0.30 per million output tokens, a request with 1,000 input tokens and 500 output tokens would cost:

Input:  (1,000 / 1,000,000) × $0.10 = $0.0001
Output: (500 / 1,000,000)   × $0.30 = $0.00015
Total:                                 $0.00025

Other model types#

Model typeBilling unit
Text generationPer million tokens (input + output)
EmbeddingsPer million tokens (input)
Image generationPer image
Text-to-speechPer request

Cached input tokens#

Some models support prompt caching. When part of your input matches a previously cached prefix, those tokens are billed at a reduced cached input rate. This happens automatically — no configuration needed.

Pricing tiers#

Each model is available in one or both tiers:

TierDescriptionWhen to use
StandardDefault tier. Cost-optimized routing.Most workloads, development, batch processing
TurboLow-latency optimized routing.Real-time applications, chatbots, user-facing features

Turbo pricing is typically 1.5-3x the standard rate, depending on the model. Not all models have a turbo tier available.

To use turbo, pass tier: "turbo" in your request:

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"tier": "turbo"},
)

Wallet system#

How the wallet works#

Your wallet holds a pre-paid USD balance. Every API call deducts the cost from your balance in real time.

  • Top up your wallet from Settings > Billing
  • View your balance in the dashboard header or the Wallet page
  • Transaction history shows every charge and top-up with details

Minimum balance#

You need a positive wallet balance to make API calls. If your balance is too low to cover the estimated cost of a request, the API returns an error:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Insufficient wallet balance ($0.12). Estimated cost: ~$0.0500. Please top up."
  }
}

Checking your balance#

View your current balance and transaction history at any time from the Wallet page in the dashboard.

Checking model pricing#

In the dashboard#

Browse the Model Catalog to see pricing for every model. Each model card shows input and output rates for both standard and turbo tiers.

Via the API#

curl https://api.oru-el.com/v1/inference/models

The response includes pricing for each model:

{
  "models": [
    {
      "id": "llama-4-maverick",
      "displayName": "Llama 4 Maverick",
      "pricing": {
        "standard": {
          "inputPerMToken": "0.10",
          "outputPerMToken": "0.30",
          "cachedInputPerMToken": "0.05"
        },
        "turbo": {
          "inputPerMToken": "0.20",
          "outputPerMToken": "0.60",
          "cachedInputPerMToken": null
        }
      }
    }
  ]
}

Cost calculation example#

Suppose you're using a model priced at $0.50/M input tokens and $1.50/M output tokens, and your request uses 2,000 input tokens and generates 800 output tokens:

ComponentTokensRate (per M)Cost
Input tokens2,000$0.50$0.001000
Output tokens800$1.50$0.001200
Total$0.002200

For cached inputs, suppose 1,500 of those 2,000 input tokens hit the cache at $0.25/M:

ComponentTokensRate (per M)Cost
Non-cached input500$0.50$0.000250
Cached input1,500$0.25$0.000375
Output tokens800$1.50$0.001200
Total$0.001825

Budget controls#

Oru-el provides built-in budget enforcement to prevent unexpected spending.

Monthly budgets#

Set a maximum monthly spend for inference. When you hit the limit:

  • Hard limit — API calls are rejected with a 429 status code
  • Soft limit — you receive alerts but calls continue
{
  "error": "BUDGET_EXCEEDED",
  "message": "Monthly inference budget of $50.00 exceeded (current: $50.12)",
  "currentSpend": 50.12,
  "limit": 50,
  "category": "inference"
}

Hourly rate limits#

Set a maximum cost per hour to catch runaway loops or misconfigured applications:

{
  "error": "RATE_LIMIT_EXCEEDED",
  "message": "Hourly cost limit of $5.00 exceeded",
  "currentHourlyCost": 5.23,
  "limit": 5
}

Configure budgets from Analytics > Budgets in the dashboard.

GPU compute pricing#

GPU machines are billed per hour based on the GPU type and configuration.

ResourceBilling
GPU instancePer hour while the machine is running
StoragePer GB per month
NetworkIncluded

Pricing varies by GPU type (A100, H100, etc.) and is shown in the GPU marketplace when you create a machine. The clock starts when the machine is provisioned and stops when you terminate it.

View GPU pricing at GPUs in the dashboard or on the public Pricing page.