Chat Completions

Complete reference for the Oru-el chat completions API endpoint.

Chat Completions#

The chat completions endpoint generates text responses from a conversation history. It supports single-turn questions, multi-turn conversations, tool calling, JSON mode, and streaming.

POST https://api.oru-el.com/v1/inference/chat/completions

Request body#

FieldTypeRequiredDefaultDescription
modelstringYesModel ID (e.g., llama-4-maverick)
messagesarrayYesConversation messages (1-256 messages)
temperaturenumberNo0.7Sampling temperature (0-2)
top_pnumberNo1.0Nucleus sampling threshold (0-1)
top_kintegerNoTop-k sampling (0-200)
min_pnumberNoMinimum probability filter (0-1)
max_tokensintegerNoMaximum tokens to generate (1-131072)
stopstring or arrayNoStop sequence(s) — up to 4 strings, max 256 chars each
streambooleanNofalseEnable SSE streaming
frequency_penaltynumberNo0Penalize tokens by frequency (-2 to 2)
presence_penaltynumberNo0Penalize tokens by presence (-2 to 2)
repetition_penaltynumberNo1.0Repetition penalty multiplier (0.1-3)
seedintegerNoSeed for reproducible sampling
response_formatobjectNoSet {"type": "json_object"} for JSON mode
toolsarrayNoTool definitions for function calling (max 128)
tool_choicestring or objectNoControls tool selection behavior
nintegerNo1Number of completions (only 1 is supported)
tierstringNo"standard"Pricing tier: "standard" or "turbo"

Message format#

Each message in the messages array has a role and content:

System message#

Sets the behavior and instructions for the assistant. Place it first in the messages array.

{
  "role": "system",
  "content": "You are a helpful coding assistant. Always include code examples."
}

User message#

A message from the user. Can be a string or multimodal content array (for vision models).

{
  "role": "user",
  "content": "Write a Python function to reverse a string."
}

Assistant message#

A previous response from the assistant. Used to provide conversation history.

{
  "role": "assistant",
  "content": "Here's a function to reverse a string:\n\n```python\ndef reverse_string(s):\n    return s[::-1]\n```"
}

Tool message#

A response to a tool call made by the assistant. Must include tool_call_id.

{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "{\"temperature\": 72, \"unit\": \"fahrenheit\"}"
}

Response format#

Non-streaming response#

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "llama-4-maverick",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Response fields#

FieldTypeDescription
idstringUnique completion ID
objectstringAlways "chat.completion"
createdintegerUnix timestamp
modelstringThe model that generated the response
choicesarrayArray of completions (always 1 element)
choices[].indexintegerAlways 0
choices[].messageobjectThe generated message
choices[].message.rolestringAlways "assistant"
choices[].message.contentstring or nullThe generated text (null when tool calls are present)
choices[].message.tool_callsarrayTool calls requested by the model (if any)
choices[].finish_reasonstringWhy generation stopped: "stop", "length", or "tool_calls"
usageobjectToken usage statistics
usage.prompt_tokensintegerTokens in the input
usage.completion_tokensintegerTokens generated
usage.total_tokensintegerSum of prompt + completion tokens

Finish reasons#

ReasonMeaning
stopModel finished naturally or hit a stop sequence
lengthHit max_tokens limit
tool_callsModel is requesting one or more tool calls

Multi-turn conversations#

To have a multi-turn conversation, include the full message history in each request:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.oru-el.com/v1/inference",
    api_key="oruel_your_api_key_here",
)

messages = [
    {"role": "system", "content": "You are a helpful math tutor."},
    {"role": "user", "content": "What is the derivative of x^2?"},
]

# First turn
response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=messages,
)
assistant_message = response.choices[0].message
messages.append({"role": "assistant", "content": assistant_message.content})

# Second turn — include full history
messages.append({"role": "user", "content": "What about x^3?"})
response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=messages,
)
print(response.choices[0].message.content)

Complete examples#

Python#

from openai import OpenAI

client = OpenAI(
    base_url="https://api.oru-el.com/v1/inference",
    api_key="oruel_your_api_key_here",
)

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[
        {"role": "system", "content": "You are a concise technical writer."},
        {"role": "user", "content": "Explain what a REST API is in 2 sentences."},
    ],
    temperature=0.3,
    max_tokens=100,
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

JavaScript#

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.oru-el.com/v1/inference",
  apiKey: "oruel_your_api_key_here",
});

const response = await client.chat.completions.create({
  model: "llama-4-maverick",
  messages: [
    { role: "system", content: "You are a concise technical writer." },
    { role: "user", content: "Explain what a REST API is in 2 sentences." },
  ],
  temperature: 0.3,
  max_tokens: 100,
});

console.log(response.choices[0].message.content);
console.log(`Tokens used: ${response.usage.total_tokens}`);

cURL#

curl https://api.oru-el.com/v1/inference/chat/completions \
  -H "Authorization: Bearer oruel_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-maverick",
    "messages": [
      {"role": "system", "content": "You are a concise technical writer."},
      {"role": "user", "content": "Explain what a REST API is in 2 sentences."}
    ],
    "temperature": 0.3,
    "max_tokens": 100
  }'

Error handling#

Common errors#

HTTP StatusCodeCause
400VALIDATION_ERRORInvalid request body (missing model, bad parameters)
401UNAUTHORIZEDMissing or invalid API key
404NOT_FOUNDModel not found or inactive
429BUDGET_EXCEEDEDMonthly or hourly budget limit reached
429Rate limitedMore than 60 requests per minute
502SERVICE_ERRORUpstream service error

Handling errors in code#

from openai import OpenAI, APIError, APIConnectionError, RateLimitError

client = OpenAI(
    base_url="https://api.oru-el.com/v1/inference",
    api_key="oruel_your_api_key_here",
)

try:
    response = client.chat.completions.create(
        model="llama-4-maverick",
        messages=[{"role": "user", "content": "Hello"}],
    )
except RateLimitError:
    print("Rate limited — wait and retry")
except APIConnectionError:
    print("Network error — check your connection")
except APIError as e:
    print(f"API error {e.status_code}: {e.message}")
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.oru-el.com/v1/inference",
  apiKey: "oruel_your_api_key_here",
});

try {
  const response = await client.chat.completions.create({
    model: "llama-4-maverick",
    messages: [{ role: "user", content: "Hello" }],
  });
} catch (error) {
  if (error instanceof OpenAI.APIError) {
    console.error(`API error ${error.status}: ${error.message}`);
  }
}