Chat Completions

Complete reference for the Oru-el chat completions API endpoint.

Chat Completions#

The chat completions endpoint generates text responses from a conversation history. It supports single-turn questions, multi-turn conversations, tool calling, JSON mode, and streaming.

POST https://api.oru-el.com/v1/inference/chat/completions

Request body#

Field	Type	Required	Default	Description
`model`	string	Yes	—	Model ID (e.g., `llama-4-maverick`)
`messages`	array	Yes	—	Conversation messages (1-256 messages)
`temperature`	number	No	0.7	Sampling temperature (0-2)
`top_p`	number	No	1.0	Nucleus sampling threshold (0-1)
`top_k`	integer	No	—	Top-k sampling (0-200)
`min_p`	number	No	—	Minimum probability filter (0-1)
`max_tokens`	integer	No	—	Maximum tokens to generate (1-131072)
`stop`	string or array	No	—	Stop sequence(s) — up to 4 strings, max 256 chars each
`stream`	boolean	No	false	Enable SSE streaming
`frequency_penalty`	number	No	0	Penalize tokens by frequency (-2 to 2)
`presence_penalty`	number	No	0	Penalize tokens by presence (-2 to 2)
`repetition_penalty`	number	No	1.0	Repetition penalty multiplier (0.1-3)
`seed`	integer	No	—	Seed for reproducible sampling
`response_format`	object	No	—	Set `{"type": "json_object"}` for JSON mode
`tools`	array	No	—	Tool definitions for function calling (max 128)
`tool_choice`	string or object	No	—	Controls tool selection behavior
`n`	integer	No	1	Number of completions (only 1 is supported)
`tier`	string	No	`"standard"`	Pricing tier: `"standard"` or `"turbo"`

Message format#

Each message in the messages array has a role and content:

System message#

Sets the behavior and instructions for the assistant. Place it first in the messages array.

{
  "role": "system",
  "content": "You are a helpful coding assistant. Always include code examples."
}

User message#

A message from the user. Can be a string or multimodal content array (for vision models).

{
  "role": "user",
  "content": "Write a Python function to reverse a string."
}

Assistant message#

A previous response from the assistant. Used to provide conversation history.

{
  "role": "assistant",
  "content": "Here's a function to reverse a string:\n\n```python\ndef reverse_string(s):\n    return s[::-1]\n```"
}

Tool message#

A response to a tool call made by the assistant. Must include tool_call_id.

{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "{\"temperature\": 72, \"unit\": \"fahrenheit\"}"
}

Response format#

Non-streaming response#

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "llama-4-maverick",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Response fields#

Field	Type	Description
`id`	string	Unique completion ID
`object`	string	Always `"chat.completion"`
`created`	integer	Unix timestamp
`model`	string	The model that generated the response
`choices`	array	Array of completions (always 1 element)
`choices[].index`	integer	Always `0`
`choices[].message`	object	The generated message
`choices[].message.role`	string	Always `"assistant"`
`choices[].message.content`	string or null	The generated text (null when tool calls are present)
`choices[].message.tool_calls`	array	Tool calls requested by the model (if any)
`choices[].finish_reason`	string	Why generation stopped: `"stop"`, `"length"`, or `"tool_calls"`
`usage`	object	Token usage statistics
`usage.prompt_tokens`	integer	Tokens in the input
`usage.completion_tokens`	integer	Tokens generated
`usage.total_tokens`	integer	Sum of prompt + completion tokens

Finish reasons#

Reason	Meaning
`stop`	Model finished naturally or hit a stop sequence
`length`	Hit `max_tokens` limit
`tool_calls`	Model is requesting one or more tool calls

Multi-turn conversations#

To have a multi-turn conversation, include the full message history in each request:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.oru-el.com/v1/inference",
    api_key="oruel_your_api_key_here",
)

messages = [
    {"role": "system", "content": "You are a helpful math tutor."},
    {"role": "user", "content": "What is the derivative of x^2?"},
]

# First turn
response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=messages,
)
assistant_message = response.choices[0].message
messages.append({"role": "assistant", "content": assistant_message.content})

# Second turn — include full history
messages.append({"role": "user", "content": "What about x^3?"})
response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=messages,
)
print(response.choices[0].message.content)

Complete examples#

Python#

from openai import OpenAI

client = OpenAI(
    base_url="https://api.oru-el.com/v1/inference",
    api_key="oruel_your_api_key_here",
)

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[
        {"role": "system", "content": "You are a concise technical writer."},
        {"role": "user", "content": "Explain what a REST API is in 2 sentences."},
    ],
    temperature=0.3,
    max_tokens=100,
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

JavaScript#

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.oru-el.com/v1/inference",
  apiKey: "oruel_your_api_key_here",
});

const response = await client.chat.completions.create({
  model: "llama-4-maverick",
  messages: [
    { role: "system", content: "You are a concise technical writer." },
    { role: "user", content: "Explain what a REST API is in 2 sentences." },
  ],
  temperature: 0.3,
  max_tokens: 100,
});

console.log(response.choices[0].message.content);
console.log(`Tokens used: ${response.usage.total_tokens}`);

cURL#

curl https://api.oru-el.com/v1/inference/chat/completions \
  -H "Authorization: Bearer oruel_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-maverick",
    "messages": [
      {"role": "system", "content": "You are a concise technical writer."},
      {"role": "user", "content": "Explain what a REST API is in 2 sentences."}
    ],
    "temperature": 0.3,
    "max_tokens": 100
  }'

Error handling#

Common errors#

HTTP Status	Code	Cause
400	`VALIDATION_ERROR`	Invalid request body (missing model, bad parameters)
401	`UNAUTHORIZED`	Missing or invalid API key
404	`NOT_FOUND`	Model not found or inactive
429	`BUDGET_EXCEEDED`	Monthly or hourly budget limit reached
429	Rate limited	More than 60 requests per minute
502	`SERVICE_ERROR`	Upstream service error

Handling errors in code#

from openai import OpenAI, APIError, APIConnectionError, RateLimitError

client = OpenAI(
    base_url="https://api.oru-el.com/v1/inference",
    api_key="oruel_your_api_key_here",
)

try:
    response = client.chat.completions.create(
        model="llama-4-maverick",
        messages=[{"role": "user", "content": "Hello"}],
    )
except RateLimitError:
    print("Rate limited — wait and retry")
except APIConnectionError:
    print("Network error — check your connection")
except APIError as e:
    print(f"API error {e.status_code}: {e.message}")

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.oru-el.com/v1/inference",
  apiKey: "oruel_your_api_key_here",
});

try {
  const response = await client.chat.completions.create({
    model: "llama-4-maverick",
    messages: [{ role: "user", content: "Hello" }],
  });
} catch (error) {
  if (error instanceof OpenAI.APIError) {
    console.error(`API error ${error.status}: ${error.message}`);
  }
}