Chat Completions
Complete reference for the Oru-el chat completions API endpoint.
Chat Completions#
The chat completions endpoint generates text responses from a conversation history. It supports single-turn questions, multi-turn conversations, tool calling, JSON mode, and streaming.
POST https://api.oru-el.com/v1/inference/chat/completions
Request body#
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | Model ID (e.g., llama-4-maverick) |
messages | array | Yes | — | Conversation messages (1-256 messages) |
temperature | number | No | 0.7 | Sampling temperature (0-2) |
top_p | number | No | 1.0 | Nucleus sampling threshold (0-1) |
top_k | integer | No | — | Top-k sampling (0-200) |
min_p | number | No | — | Minimum probability filter (0-1) |
max_tokens | integer | No | — | Maximum tokens to generate (1-131072) |
stop | string or array | No | — | Stop sequence(s) — up to 4 strings, max 256 chars each |
stream | boolean | No | false | Enable SSE streaming |
frequency_penalty | number | No | 0 | Penalize tokens by frequency (-2 to 2) |
presence_penalty | number | No | 0 | Penalize tokens by presence (-2 to 2) |
repetition_penalty | number | No | 1.0 | Repetition penalty multiplier (0.1-3) |
seed | integer | No | — | Seed for reproducible sampling |
response_format | object | No | — | Set {"type": "json_object"} for JSON mode |
tools | array | No | — | Tool definitions for function calling (max 128) |
tool_choice | string or object | No | — | Controls tool selection behavior |
n | integer | No | 1 | Number of completions (only 1 is supported) |
tier | string | No | "standard" | Pricing tier: "standard" or "turbo" |
Message format#
Each message in the messages array has a role and content:
System message#
Sets the behavior and instructions for the assistant. Place it first in the messages array.
{
"role": "system",
"content": "You are a helpful coding assistant. Always include code examples."
}
User message#
A message from the user. Can be a string or multimodal content array (for vision models).
{
"role": "user",
"content": "Write a Python function to reverse a string."
}
Assistant message#
A previous response from the assistant. Used to provide conversation history.
{
"role": "assistant",
"content": "Here's a function to reverse a string:\n\n```python\ndef reverse_string(s):\n return s[::-1]\n```"
}
Tool message#
A response to a tool call made by the assistant. Must include tool_call_id.
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "{\"temperature\": 72, \"unit\": \"fahrenheit\"}"
}
Response format#
Non-streaming response#
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "llama-4-maverick",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}
Response fields#
| Field | Type | Description |
|---|---|---|
id | string | Unique completion ID |
object | string | Always "chat.completion" |
created | integer | Unix timestamp |
model | string | The model that generated the response |
choices | array | Array of completions (always 1 element) |
choices[].index | integer | Always 0 |
choices[].message | object | The generated message |
choices[].message.role | string | Always "assistant" |
choices[].message.content | string or null | The generated text (null when tool calls are present) |
choices[].message.tool_calls | array | Tool calls requested by the model (if any) |
choices[].finish_reason | string | Why generation stopped: "stop", "length", or "tool_calls" |
usage | object | Token usage statistics |
usage.prompt_tokens | integer | Tokens in the input |
usage.completion_tokens | integer | Tokens generated |
usage.total_tokens | integer | Sum of prompt + completion tokens |
Finish reasons#
| Reason | Meaning |
|---|---|
stop | Model finished naturally or hit a stop sequence |
length | Hit max_tokens limit |
tool_calls | Model is requesting one or more tool calls |
Multi-turn conversations#
To have a multi-turn conversation, include the full message history in each request:
from openai import OpenAI
client = OpenAI(
base_url="https://api.oru-el.com/v1/inference",
api_key="oruel_your_api_key_here",
)
messages = [
{"role": "system", "content": "You are a helpful math tutor."},
{"role": "user", "content": "What is the derivative of x^2?"},
]
# First turn
response = client.chat.completions.create(
model="llama-4-maverick",
messages=messages,
)
assistant_message = response.choices[0].message
messages.append({"role": "assistant", "content": assistant_message.content})
# Second turn — include full history
messages.append({"role": "user", "content": "What about x^3?"})
response = client.chat.completions.create(
model="llama-4-maverick",
messages=messages,
)
print(response.choices[0].message.content)
Complete examples#
Python#
from openai import OpenAI
client = OpenAI(
base_url="https://api.oru-el.com/v1/inference",
api_key="oruel_your_api_key_here",
)
response = client.chat.completions.create(
model="llama-4-maverick",
messages=[
{"role": "system", "content": "You are a concise technical writer."},
{"role": "user", "content": "Explain what a REST API is in 2 sentences."},
],
temperature=0.3,
max_tokens=100,
)
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")
JavaScript#
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.oru-el.com/v1/inference",
apiKey: "oruel_your_api_key_here",
});
const response = await client.chat.completions.create({
model: "llama-4-maverick",
messages: [
{ role: "system", content: "You are a concise technical writer." },
{ role: "user", content: "Explain what a REST API is in 2 sentences." },
],
temperature: 0.3,
max_tokens: 100,
});
console.log(response.choices[0].message.content);
console.log(`Tokens used: ${response.usage.total_tokens}`);
cURL#
curl https://api.oru-el.com/v1/inference/chat/completions \
-H "Authorization: Bearer oruel_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-maverick",
"messages": [
{"role": "system", "content": "You are a concise technical writer."},
{"role": "user", "content": "Explain what a REST API is in 2 sentences."}
],
"temperature": 0.3,
"max_tokens": 100
}'
Error handling#
Common errors#
| HTTP Status | Code | Cause |
|---|---|---|
| 400 | VALIDATION_ERROR | Invalid request body (missing model, bad parameters) |
| 401 | UNAUTHORIZED | Missing or invalid API key |
| 404 | NOT_FOUND | Model not found or inactive |
| 429 | BUDGET_EXCEEDED | Monthly or hourly budget limit reached |
| 429 | Rate limited | More than 60 requests per minute |
| 502 | SERVICE_ERROR | Upstream service error |
Handling errors in code#
from openai import OpenAI, APIError, APIConnectionError, RateLimitError
client = OpenAI(
base_url="https://api.oru-el.com/v1/inference",
api_key="oruel_your_api_key_here",
)
try:
response = client.chat.completions.create(
model="llama-4-maverick",
messages=[{"role": "user", "content": "Hello"}],
)
except RateLimitError:
print("Rate limited — wait and retry")
except APIConnectionError:
print("Network error — check your connection")
except APIError as e:
print(f"API error {e.status_code}: {e.message}")
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.oru-el.com/v1/inference",
apiKey: "oruel_your_api_key_here",
});
try {
const response = await client.chat.completions.create({
model: "llama-4-maverick",
messages: [{ role: "user", content: "Hello" }],
});
} catch (error) {
if (error instanceof OpenAI.APIError) {
console.error(`API error ${error.status}: ${error.message}`);
}
}