Models

Browse available models, understand categories, and choose the right model for your task.

Models#

Oru-el provides access to 100+ leading open-source and commercial models. All models are available through the same API — just change the model parameter.

Listing models#

Via the API#

Fetch the full model catalog (no authentication required):

curl https://api.oru-el.com/v1/inference/models

Response:

{
  "models": [
    {
      "id": "llama-4-maverick",
      "displayName": "Llama 4 Maverick",
      "description": "Meta's latest open-source model...",
      "modelType": "TEXT_GENERATION",
      "category": "CHAT",
      "creator": "Meta",
      "parameterCount": "17B",
      "contextWindow": 131072,
      "quantization": null,
      "isFeatured": true,
      "capabilities": ["chat", "tool_calling", "json_mode"],
      "pricing": {
        "standard": {
          "inputPerMToken": "0.10",
          "outputPerMToken": "0.30",
          "cachedInputPerMToken": "0.05"
        },
        "turbo": {
          "inputPerMToken": "0.20",
          "outputPerMToken": "0.60",
          "cachedInputPerMToken": null
        }
      }
    }
  ]
}

Get a specific model#

curl https://api.oru-el.com/v1/inference/models/llama-4-maverick

This returns additional details including benchmarks, architecture info, and links.

In the dashboard#

Browse the Model Catalog in the dashboard or visit the public catalog at oru-el.com/catalog to explore models with a visual interface. Each model card shows capabilities, pricing, context window, and a link to try it in the Playground.

Model fields#

FieldTypeDescription
idstringModel identifier — use this in API calls
displayNamestringHuman-readable name
descriptionstringWhat the model is good at
modelTypestringType of model (see below)
categorystringCapability category (see below)
creatorstringOrganization that created the model
parameterCountstringModel size (e.g., "7B", "70B", "405B")
contextWindowintegerMaximum context length in tokens
quantizationstring or nullQuantization level (e.g., "FP8", "INT4")
isFeaturedbooleanWhether this model is featured/recommended
capabilitiesarraySupported features (chat, tool_calling, json_mode, vision, etc.)
pricingobjectStandard and turbo pricing per million tokens

Model types#

TypeDescriptionEndpoint
TEXT_GENERATIONChat completions and text generation/chat/completions
EMBEDDINGSConvert text to vector representations/embeddings
TEXT_TO_IMAGEGenerate images from text prompts/images/generations
TEXT_TO_SPEECHConvert text to audio/audio/speech

Model categories#

Categories describe what a model is optimized for:

CategoryDescriptionExample models
ChatGeneral-purpose conversation and instruction followingLlama 4 Maverick, Qwen 2.5
ReasoningComplex problem-solving, math, logic, and multi-step thinkingDeepSeek R1, QwQ
CodeCode generation, debugging, and code understandingDeepSeek Coder, CodeLlama
VisionMultimodal models that can process images and textLlama 3.2 Vision, InternVL
MultilingualStrong performance across many languagesQwen, NLLB
EfficientSmall, fast models for cost-sensitive or latency-sensitive useLlama 3.2 1B/3B, Gemma 2B

Choosing the right model#

By task#

TaskRecommended approach
General Q&ALarge chat model (70B+)
Code generationCode-specialized model or large chat model with temperature 0
Creative writingLarge chat model with higher temperature (0.8-1.2)
Data extractionAny chat model with JSON mode and temperature 0
ClassificationEfficient model with temperature 0
EmbeddingsEmbedding model matched to your use case
Real-time chatTurbo tier with a fast model
Batch processingStandard tier with a cost-effective model

By priority#

PriorityRecommendation
Best qualityLargest model available (405B, 70B); reasoning models for complex tasks
Lowest costEfficient models (7B, 3B, 1B); standard tier
Fastest responseTurbo tier; efficient models
Longest contextModels with 128K+ context window

Cost vs. quality tradeoff#

Larger models produce higher-quality output but cost more per token. For many tasks, smaller models are sufficient:

  • 1B-3B models — Classification, simple extraction, routing
  • 7B-8B models — General chat, summarization, translation
  • 27B-70B models — Complex reasoning, nuanced writing, code generation
  • 70B+ models — State-of-the-art quality for the hardest tasks

Pricing tiers#

Each model has one or two pricing tiers:

Standard tier#

The default tier. Requests are routed for the lowest cost.

Turbo tier#

Low-latency optimized routing for faster responses at a higher price. Enable with:

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"tier": "turbo"},
)

Not all models have a turbo tier. Check the pricing.turbo field in the model response — if it's null, only standard routing is available.

Context windows#

The context window determines the maximum number of tokens (input + output) a model can handle in a single request. Context windows vary by model:

Context sizeWhat it means
4,096 tokens~3,000 words — short conversations
8,192 tokens~6,000 words — medium conversations
32,768 tokens~24,000 words — long documents
65,536 tokens~49,000 words — very long documents
131,072 tokens~98,000 words — book-length content

If your input exceeds the context window, the API will return an error. Plan your message history and system prompts accordingly.

Using a model in code#

Every model uses the same API — just swap the model parameter:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.oru-el.com/v1/inference",
    api_key="oruel_your_api_key_here",
)

# Use any model by its ID
response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Switch to a different model — no other code changes needed
response = client.chat.completions.create(
    model="deepseek-r1-0528",
    messages=[{"role": "user", "content": "Solve this step by step: what is 15% of 340?"}],
)

Featured models are highlighted in the catalog for their quality, popularity, and reliability. They're a good starting point if you're not sure which model to choose. Look for models with isFeatured: true in the API response.