Models

Browse available models, understand categories, and choose the right model for your task.

Models#

Oru-el provides access to 100+ leading open-source and commercial models. All models are available through the same API — just change the model parameter.

Listing models#

Via the API#

Fetch the full model catalog (no authentication required):

curl https://api.oru-el.com/v1/inference/models

Response:

{
  "models": [
    {
      "id": "llama-4-maverick",
      "displayName": "Llama 4 Maverick",
      "description": "Meta's latest open-source model...",
      "modelType": "TEXT_GENERATION",
      "category": "CHAT",
      "creator": "Meta",
      "parameterCount": "17B",
      "contextWindow": 131072,
      "quantization": null,
      "isFeatured": true,
      "capabilities": ["chat", "tool_calling", "json_mode"],
      "pricing": {
        "standard": {
          "inputPerMToken": "0.10",
          "outputPerMToken": "0.30",
          "cachedInputPerMToken": "0.05"
        },
        "turbo": {
          "inputPerMToken": "0.20",
          "outputPerMToken": "0.60",
          "cachedInputPerMToken": null
        }
      }
    }
  ]
}

Get a specific model#

curl https://api.oru-el.com/v1/inference/models/llama-4-maverick

This returns additional details including benchmarks, architecture info, and links.

Browse the Model Catalog in the dashboard or visit the public catalog at oru-el.com/catalog to explore models with a visual interface. Each model card shows capabilities, pricing, context window, and a link to try it in the Playground.

Model fields#

Field	Type	Description
`id`	string	Model identifier — use this in API calls
`displayName`	string	Human-readable name
`description`	string	What the model is good at
`modelType`	string	Type of model (see below)
`category`	string	Capability category (see below)
`creator`	string	Organization that created the model
`parameterCount`	string	Model size (e.g., "7B", "70B", "405B")
`contextWindow`	integer	Maximum context length in tokens
`quantization`	string or null	Quantization level (e.g., "FP8", "INT4")
`isFeatured`	boolean	Whether this model is featured/recommended
`capabilities`	array	Supported features (chat, tool_calling, json_mode, vision, etc.)
`pricing`	object	Standard and turbo pricing per million tokens

Model types#

Type	Description	Endpoint
`TEXT_GENERATION`	Chat completions and text generation	`/chat/completions`
`EMBEDDINGS`	Convert text to vector representations	`/embeddings`
`TEXT_TO_IMAGE`	Generate images from text prompts	`/images/generations`
`TEXT_TO_SPEECH`	Convert text to audio	`/audio/speech`

Model categories#

Categories describe what a model is optimized for:

Category	Description	Example models
Chat	General-purpose conversation and instruction following	Llama 4 Maverick, Qwen 2.5
Reasoning	Complex problem-solving, math, logic, and multi-step thinking	DeepSeek R1, QwQ
Code	Code generation, debugging, and code understanding	DeepSeek Coder, CodeLlama
Vision	Multimodal models that can process images and text	Llama 3.2 Vision, InternVL
Multilingual	Strong performance across many languages	Qwen, NLLB
Efficient	Small, fast models for cost-sensitive or latency-sensitive use	Llama 3.2 1B/3B, Gemma 2B

Choosing the right model#

By task#

Task	Recommended approach
General Q&A	Large chat model (70B+)
Code generation	Code-specialized model or large chat model with temperature 0
Creative writing	Large chat model with higher temperature (0.8-1.2)
Data extraction	Any chat model with JSON mode and temperature 0
Classification	Efficient model with temperature 0
Embeddings	Embedding model matched to your use case
Real-time chat	Turbo tier with a fast model
Batch processing	Standard tier with a cost-effective model

By priority#

Priority	Recommendation
Best quality	Largest model available (405B, 70B); reasoning models for complex tasks
Lowest cost	Efficient models (7B, 3B, 1B); standard tier
Fastest response	Turbo tier; efficient models
Longest context	Models with 128K+ context window

Cost vs. quality tradeoff#

Larger models produce higher-quality output but cost more per token. For many tasks, smaller models are sufficient:

1B-3B models — Classification, simple extraction, routing
7B-8B models — General chat, summarization, translation
27B-70B models — Complex reasoning, nuanced writing, code generation
70B+ models — State-of-the-art quality for the hardest tasks

Pricing tiers#

Each model has one or two pricing tiers:

Standard tier#

The default tier. Requests are routed for the lowest cost.

Turbo tier#

Low-latency optimized routing for faster responses at a higher price. Enable with:

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"tier": "turbo"},
)

Not all models have a turbo tier. Check the pricing.turbo field in the model response — if it's null, only standard routing is available.

Context windows#

The context window determines the maximum number of tokens (input + output) a model can handle in a single request. Context windows vary by model:

Context size	What it means
4,096 tokens	~3,000 words — short conversations
8,192 tokens	~6,000 words — medium conversations
32,768 tokens	~24,000 words — long documents
65,536 tokens	~49,000 words — very long documents
131,072 tokens	~98,000 words — book-length content

If your input exceeds the context window, the API will return an error. Plan your message history and system prompts accordingly.

Using a model in code#

Every model uses the same API — just swap the model parameter:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.oru-el.com/v1/inference",
    api_key="oruel_your_api_key_here",
)

# Use any model by its ID
response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Switch to a different model — no other code changes needed
response = client.chat.completions.create(
    model="deepseek-r1-0528",
    messages=[{"role": "user", "content": "Solve this step by step: what is 15% of 340?"}],
)

Featured models#

Featured models are highlighted in the catalog for their quality, popularity, and reliability. They're a good starting point if you're not sure which model to choose. Look for models with isFeatured: true in the API response.