llm.ftrz.de

Self-hosted LLM API — OpenAI-compatible Checking...

Authentication

All /v1/* endpoints require a Bearer token:

Authorization: Bearer YOUR_API_KEY

The /health endpoint is public.

Endpoints

POST /v1/chat/completions

Chat completion (messages format). Supports streaming via "stream": true.

curl https://llm.ftrz.de/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-coder",
    "messages": [
      {"role": "system", "content": "You are a helpful coding assistant."},
      {"role": "user", "content": "Write a fibonacci function in Python"}
    ],
    "max_tokens": 512,
    "temperature": 0.7
  }'
POST /v1/completions

Text completion (prompt format).

curl https://llm.ftrz.de/v1/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-coder",
    "prompt": "def binary_search(arr, target):",
    "max_tokens": 256
  }'
GET /v1/models

List available models.

GET /health

Health check (no auth required). Returns server status and loaded model info.

Client Configuration

OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://llm.ftrz.de/v1",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="qwen3-coder",
    messages=[{"role": "user", "content": "Hello!"}]
)

Specs

HardwareAMD Ryzen AI Max 395 — 96 GB unified memory
Backendllama.cpp (ROCm HIP)
Speed~35 tok/s generation, ~50 tok/s prompt processing
Rate Limit30 req/min per IP
Timeout600s per request