Models
Browse available models, understand categories, and choose the right model for your task.
Models#
Oru-el provides access to 100+ leading open-source and commercial models. All models are available through the same API — just change the model parameter.
Listing models#
Via the API#
Fetch the full model catalog (no authentication required):
curl https://api.oru-el.com/v1/inference/models
Response:
{
"models": [
{
"id": "llama-4-maverick",
"displayName": "Llama 4 Maverick",
"description": "Meta's latest open-source model...",
"modelType": "TEXT_GENERATION",
"category": "CHAT",
"creator": "Meta",
"parameterCount": "17B",
"contextWindow": 131072,
"quantization": null,
"isFeatured": true,
"capabilities": ["chat", "tool_calling", "json_mode"],
"pricing": {
"standard": {
"inputPerMToken": "0.10",
"outputPerMToken": "0.30",
"cachedInputPerMToken": "0.05"
},
"turbo": {
"inputPerMToken": "0.20",
"outputPerMToken": "0.60",
"cachedInputPerMToken": null
}
}
}
]
}
Get a specific model#
curl https://api.oru-el.com/v1/inference/models/llama-4-maverick
This returns additional details including benchmarks, architecture info, and links.
In the dashboard#
Browse the Model Catalog in the dashboard or visit the public catalog at oru-el.com/catalog to explore models with a visual interface. Each model card shows capabilities, pricing, context window, and a link to try it in the Playground.
Model fields#
| Field | Type | Description |
|---|---|---|
id | string | Model identifier — use this in API calls |
displayName | string | Human-readable name |
description | string | What the model is good at |
modelType | string | Type of model (see below) |
category | string | Capability category (see below) |
creator | string | Organization that created the model |
parameterCount | string | Model size (e.g., "7B", "70B", "405B") |
contextWindow | integer | Maximum context length in tokens |
quantization | string or null | Quantization level (e.g., "FP8", "INT4") |
isFeatured | boolean | Whether this model is featured/recommended |
capabilities | array | Supported features (chat, tool_calling, json_mode, vision, etc.) |
pricing | object | Standard and turbo pricing per million tokens |
Model types#
| Type | Description | Endpoint |
|---|---|---|
TEXT_GENERATION | Chat completions and text generation | /chat/completions |
EMBEDDINGS | Convert text to vector representations | /embeddings |
TEXT_TO_IMAGE | Generate images from text prompts | /images/generations |
TEXT_TO_SPEECH | Convert text to audio | /audio/speech |
Model categories#
Categories describe what a model is optimized for:
| Category | Description | Example models |
|---|---|---|
| Chat | General-purpose conversation and instruction following | Llama 4 Maverick, Qwen 2.5 |
| Reasoning | Complex problem-solving, math, logic, and multi-step thinking | DeepSeek R1, QwQ |
| Code | Code generation, debugging, and code understanding | DeepSeek Coder, CodeLlama |
| Vision | Multimodal models that can process images and text | Llama 3.2 Vision, InternVL |
| Multilingual | Strong performance across many languages | Qwen, NLLB |
| Efficient | Small, fast models for cost-sensitive or latency-sensitive use | Llama 3.2 1B/3B, Gemma 2B |
Choosing the right model#
By task#
| Task | Recommended approach |
|---|---|
| General Q&A | Large chat model (70B+) |
| Code generation | Code-specialized model or large chat model with temperature 0 |
| Creative writing | Large chat model with higher temperature (0.8-1.2) |
| Data extraction | Any chat model with JSON mode and temperature 0 |
| Classification | Efficient model with temperature 0 |
| Embeddings | Embedding model matched to your use case |
| Real-time chat | Turbo tier with a fast model |
| Batch processing | Standard tier with a cost-effective model |
By priority#
| Priority | Recommendation |
|---|---|
| Best quality | Largest model available (405B, 70B); reasoning models for complex tasks |
| Lowest cost | Efficient models (7B, 3B, 1B); standard tier |
| Fastest response | Turbo tier; efficient models |
| Longest context | Models with 128K+ context window |
Cost vs. quality tradeoff#
Larger models produce higher-quality output but cost more per token. For many tasks, smaller models are sufficient:
- 1B-3B models — Classification, simple extraction, routing
- 7B-8B models — General chat, summarization, translation
- 27B-70B models — Complex reasoning, nuanced writing, code generation
- 70B+ models — State-of-the-art quality for the hardest tasks
Pricing tiers#
Each model has one or two pricing tiers:
Standard tier#
The default tier. Requests are routed for the lowest cost.
Turbo tier#
Low-latency optimized routing for faster responses at a higher price. Enable with:
response = client.chat.completions.create(
model="llama-4-maverick",
messages=[{"role": "user", "content": "Hello!"}],
extra_body={"tier": "turbo"},
)
Not all models have a turbo tier. Check the pricing.turbo field in the model response — if it's null, only standard routing is available.
Context windows#
The context window determines the maximum number of tokens (input + output) a model can handle in a single request. Context windows vary by model:
| Context size | What it means |
|---|---|
| 4,096 tokens | ~3,000 words — short conversations |
| 8,192 tokens | ~6,000 words — medium conversations |
| 32,768 tokens | ~24,000 words — long documents |
| 65,536 tokens | ~49,000 words — very long documents |
| 131,072 tokens | ~98,000 words — book-length content |
If your input exceeds the context window, the API will return an error. Plan your message history and system prompts accordingly.
Using a model in code#
Every model uses the same API — just swap the model parameter:
from openai import OpenAI
client = OpenAI(
base_url="https://api.oru-el.com/v1/inference",
api_key="oruel_your_api_key_here",
)
# Use any model by its ID
response = client.chat.completions.create(
model="llama-4-maverick",
messages=[{"role": "user", "content": "Hello!"}],
)
# Switch to a different model — no other code changes needed
response = client.chat.completions.create(
model="deepseek-r1-0528",
messages=[{"role": "user", "content": "Solve this step by step: what is 15% of 340?"}],
)
Featured models#
Featured models are highlighted in the catalog for their quality, popularity, and reliability. They're a good starting point if you're not sure which model to choose. Look for models with isFeatured: true in the API response.