One OpenAI-compatible endpoint for Claude, GPT, Gemini, DeepSeek and more. This is the full Zylo API reference — authentication, base URL, endpoints, streaming, tool calling, web-extract, rate limits per plan and error codes — with copy-paste examples in Python, JavaScript, Go, Ruby and PHP.
Jump to any part of the Zylo API reference.
Every Zylo API request is authenticated with your API key.
Create a key at console.zyloai.net — it is issued on the free Basic plan, no credit card. Send it on every request, either as an Authorization: Bearer header (recommended — this is what the OpenAI SDKs send) or as an X-API-Key header. Keep keys server-side; you can rotate or replace a key from the console at any time.
# Recommended — Bearer header (works with every OpenAI SDK) curl https://api.zyloai.net/v1/chat/completions \ -H "Authorization: Bearer YOUR_ZYLO_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-opus-4.8", "messages": [{"role":"user","content":"Hi"}] }' # Also accepted — X-API-Key header curl https://api.zyloai.net/v1/chat/completions \ -H "X-API-Key: YOUR_ZYLO_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5.5", "messages": [{"role":"user","content":"Hi"}] }'
All paths are relative to the base URL https://api.zyloai.net/v1.
| Method | Path | What it does |
|---|---|---|
| POST | /v1/chat/completions |
OpenAI-compatible chat completions. Supports stream, tools and multimodal input. The main endpoint for most apps. |
| POST | /generate |
Native generation endpoint with attachments and tools, returning a flat message plus usage, latency and credits. |
| POST | /v1/web-extract |
Scrape and extract clean text from a URL for RAG. 10 requests/minute per key. |
| POST | /v1/images/generations |
Generate or edit images with supported standard and premium models. |
| GET | /v1/models |
List the models your key can call on its current plan. |
| GET | /stats |
Your current credit balance, request limits and usage history. |
| GET | /validate |
Check whether a key is valid and return its plan capabilities. |
id, choices[].message, finish_reason and a usage block (prompt_tokens, completion_tokens, total_tokens). See the full documentation for every field.Sent in the JSON body of a chat completions or generate request.
| Parameter | Type | Notes |
|---|---|---|
model | string | Required. A bare model id, e.g. claude-opus-4.8, gpt-5.5, gemini-3.1-pro-preview. |
messages | array | Required. Conversation history of {role, content} objects. |
temperature | float | Sampling temperature, 0–1. Default 0.7. |
max_tokens | integer | Max new tokens to generate. Alias of max_new_tokens. |
top_p | float | Nucleus sampling, 0–1. Default 1.0. |
frequency_penalty | float | Reduce repetition, 0–2. Default 0. |
presence_penalty | float | Encourage new topics, 0–2. Default 0. |
response_format | string | Set to "json_object" to force valid JSON output. |
stream | boolean | Stream tokens as server-sent events. Default false. |
tools | array | Function definitions and/or native tools the model may call. Paid plans only. |
Set stream: true to receive tokens as server-sent events in the OpenAI delta format, ending with a data: [DONE] line.
# pip install openai from openai import OpenAI client = OpenAI(api_key="YOUR_ZYLO_KEY", base_url="https://api.zyloai.net/v1") stream = client.chat.completions.create( model="claude-opus-4.8", messages=[{"role": "user", "content": "Write a haiku about streaming."}], stream=True, ) for chunk in stream: delta = chunk.choices[0].delta.content or "" print(delta, end="", flush=True)
// npm install openai import OpenAI from "openai"; const client = new OpenAI({ apiKey: "YOUR_ZYLO_KEY", baseURL: "https://api.zyloai.net/v1" }); const stream = await client.chat.completions.create({ model: "claude-opus-4.8", messages: [{ role: "user", content: "Write a haiku about streaming." }], stream: true, }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content || ""); }
curl https://api.zyloai.net/v1/chat/completions \ -H "Authorization: Bearer YOUR_ZYLO_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-opus-4.8", "messages": [{"role": "user", "content": "Write a haiku about streaming."}], "stream": true }' # -> data: {"choices":[{"delta":{"content":"..."}}]} ... ends with data: [DONE]
Pass a tools array so the model can call your functions, or attach Zylo's native tools. Available on paid plans (not Basic).
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}]
resp = client.chat.completions.create(
model="gpt-5.5",
messages=[{"role": "user", "content": "What's the weather in Lima?"}],
tools=tools,
)
# resp.choices[0].message.tool_calls -> call get_weather, then send the result back
{
"model": "gemini-3.1-pro-preview",
"messages": [{"role": "user", "content": "Summarise today's AI news."}],
"tools": [
{ "type": "function", "function": { "name": "google_search" } },
{ "type": "function", "function": { "name": "url_context" } },
{ "type": "function", "function": { "name": "code_execution" } }
]
}
# Native tools share platform-wide limits (see Rate limits below).
The /v1/web-extract endpoint pulls clean text from a URL so you can ground answers in real sources — no separate scraping stack. Limited to 10 requests/minute per key.
# 1) Extract clean content from the web import requests extract = requests.post( "https://api.zyloai.net/v1/web-extract", headers={"Authorization": "Bearer YOUR_ZYLO_KEY"}, json={"url": "https://example.com/article"}, ).json() # 2) Ground a normal chat completion in the extracted text answer = client.chat.completions.create( model="gemini-3.1-pro-preview", messages=[ {"role": "system", "content": "Answer using only the provided sources."}, {"role": "user", "content": f"Sources:\n{extract}\n\nQuestion: What changed?"}, ], ) print(answer.choices[0].message.content)
Send images alongside text using the OpenAI content-parts format. Multimodal input requires a paid plan (Basic is text-only).
resp = client.chat.completions.create(
model="gemini-3.1-pro-preview",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url",
"image_url": {"url": "data:image/jpeg;base64,..."}},
],
}],
)
print(resp.choices[0].message.content)
The Zylo API uses standard HTTP status codes.
| Status | Meaning | What to do |
|---|---|---|
| 200 | OK | Request completed successfully. |
| 400 | Bad Request | Missing parameters or invalid JSON — fix the body. |
| 401 | Unauthorized | Invalid or missing API key — check the header. |
| 402 | Payment Required | Out of credits — top up to keep using premium models. |
| 403 | Forbidden | Plan limit reached or model not on your plan — upgrade. |
| 429 | Too Many Requests | Rate limit exceeded — back off and retry (see below). |
| 500 | Server Error | Transient internal error — retry with backoff. |
Retry 429 and 5xx responses with exponential backoff. Do not retry 400/401/403 — fix the request instead.
import time from openai import OpenAI, APIStatusError client = OpenAI(api_key="YOUR_ZYLO_KEY", base_url="https://api.zyloai.net/v1") def chat_with_retry(messages, model="claude-opus-4.8", retries=4): for attempt in range(retries): try: return client.chat.completions.create(model=model, messages=messages) except APIStatusError as e: # Retry only on rate limits / server errors if e.status_code in (429, 500, 502, 503) and attempt < retries - 1: time.sleep(2 ** attempt) # 1s, 2s, 4s, 8s continue raise
Because the Zylo API is OpenAI-compatible, any OpenAI SDK or plain HTTP client works — just repoint the base URL. Here it is in Go, Ruby and PHP.
package main
import (
"bytes"
"net/http"
)
func main() {
body := []byte(`{"model":"claude-opus-4.8","messages":[{"role":"user","content":"Hello from Zylo!"}]}`)
req, _ := http.NewRequest("POST", "https://api.zyloai.net/v1/chat/completions", bytes.NewBuffer(body))
req.Header.Set("Authorization", "Bearer YOUR_ZYLO_KEY")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
// decode resp.Body into your struct
}
require "net/http" require "json" uri = URI("https://api.zyloai.net/v1/chat/completions") http = Net::HTTP.new(uri.host, uri.port) http.use_ssl = true req = Net::HTTP::Post.new(uri) req["Authorization"] = "Bearer YOUR_ZYLO_KEY" req["Content-Type"] = "application/json" req.body = { model: "claude-opus-4.8", messages: [{ role: "user", content: "Hello from Zylo!" }] }.to_json res = http.request(req) puts JSON.parse(res.body).dig("choices", 0, "message", "content")
<?php $ch = curl_init("https://api.zyloai.net/v1/chat/completions"); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_HTTPHEADER => [ "Authorization: Bearer YOUR_ZYLO_KEY", "Content-Type: application/json", ], CURLOPT_POSTFIELDS => json_encode([ "model" => "claude-opus-4.8", "messages" => [["role" => "user", "content" => "Hello from Zylo!"]], ]), ]); $data = json_decode(curl_exec($ch), true); echo $data["choices"][0]["message"]["content"];
Plans set your daily request and token limits; usage is billed separately from prepaid credits at base per-token rates.
| Plan | Requests | Daily tokens | Models & extras |
|---|---|---|---|
| Basic · $0 | 7.2k/day · 10/min | 200k | Basic models only · text only · no credits |
| Go · $10 | 28.8k/day | 512k | Premium models · web search · $10 credits |
| Pro · $50 | 43.2k/day | 1M | Code execution · $50 credits |
| Mega · $200 | 86.4k/day | 5M | Priority access · $200 credits |
| Enterprise · $400 | Unlimited | Unlimited | Dedicated GPU · $400 credits |
web-extract is 10 req/min per key; the native web search and URL context tools share a combined 10 req/min budget; code execution has its own 20 req/min budget. Exceeding any limit returns 429. Full breakdown on the pricing page.You keep your OpenAI-compatible client. Repoint the base URL, use your Zylo key, and pick a model id. Here is the before/after.
client = OpenAI(
api_key="OPENAI_KEY",
)
model = "gpt-5.5"
client = OpenAI(
api_key="YOUR_ZYLO_KEY",
base_url="https://api.zyloai.net/v1",
)
model = "gpt-5.5" # or claude-opus-4.8, gemini-3.1-pro...
client = OpenAI(
api_key="OPENROUTER_KEY",
base_url="https://openrouter.ai/api/v1",
)
model = "anthropic/claude-opus-4.8"
client = OpenAI(
api_key="YOUR_ZYLO_KEY",
base_url="https://api.zyloai.net/v1",
)
model = "claude-opus-4.8" # drop the vendor/ prefix
Recent additions to the Zylo API catalogue. The models page is always the live, complete list.
GET /v1/models or browse all models.Operational facts you can verify, not marketing numbers.
Per-model availability and incident history are published at status.zyloai.net, updated by an automated prober.
Every generation returns a real latency value and a usage token count, so you measure throughput from your own traffic — no guessing.
Prompts and completions are never stored. Zylo is a passthrough to the providers; only numeric usage is logged for billing.
Sustained capacity scales with your plan — up to 5M tokens/day on Mega, and unlimited requests and tokens on Enterprise.
The questions developers ask most about the Zylo API.
Sign up at console.zyloai.net and your API key is created on the Basic plan with no credit card. Copy it from the console, send it as an Authorization: Bearer header, and you can call any Basic-tier model immediately. Upgrade to a paid plan to unlock premium models and credits. You can rotate or replace the key from the console at any time.
The Zylo API base URL is https://api.zyloai.net/v1. Point any OpenAI-compatible SDK at that base URL, use your Zylo API key, and call /chat/completions with a bare model id such as claude-opus-4.8 or gpt-5.5.
Yes. The Zylo API implements the OpenAI Chat Completions schema, so the official OpenAI SDKs work unchanged — set base_url to https://api.zyloai.net/v1 and use your Zylo key. Request and response shapes, streaming and tool calling all match.
Yes. Set "stream": true on a chat completions request and the Zylo API returns tokens as server-sent events in the OpenAI delta format, terminated by a data: [DONE] line. Any OpenAI-compatible streaming client works.
Yes. Pass a tools array of function definitions and the model can return tool calls, exactly like the OpenAI API. Zylo also exposes native tools — web search, URL context and a code-execution sandbox. Tool calling is available on paid plans, not the free Basic plan.
Rate limits depend on your plan: Basic allows 7,200 requests/day and 10 requests/minute with 200k daily tokens; Go 28,800/day with 512k tokens; Pro 43,200/day with 1M tokens; Mega 86,400/day with 5M tokens; Enterprise is unlimited. Exceeding a limit returns 429. The web-extract endpoint is limited to 10 requests/minute per key, and native tools share platform-wide limits.
The Basic plan is free with no credit card and includes a daily token and request allowance on Basic-tier models. Premium models require a paid plan and prepaid credits; usage is billed at each model's base per-token rate with no markup, and a flat 25% platform fee applies only when you add credits.
Frontier and cost-efficient models from seven providers — Anthropic (Claude), OpenAI (GPT), Google (Gemini), DeepSeek, Qwen, MiniMax and Moonshot (Kimi). Call GET /v1/models for the live list your key can access, or see the models page for pricing.
Keep your existing OpenAI-compatible client. Change base_url to https://api.zyloai.net/v1 and use your Zylo key. Coming from OpenAI, just pick a Zylo model id; coming from OpenRouter, drop the vendor/ prefix so anthropic/claude-opus-4.8 becomes claude-opus-4.8. See the OpenAI and OpenRouter guides.
Standard HTTP status codes: 200 OK, 400 bad request, 401 invalid or missing API key, 402 out of credits, 403 plan limit reached or model restricted, 429 rate limited, and 500 internal error. Retry 429 and 5xx with backoff; fix the request for the others.
Free API key, OpenAI-compatible, 40+ models behind one base URL. No card required to start.
Get free API key