A hosted AI chat endpoint with streaming, retrieval-augmented generation, web search, and bring-your-own-key management — running on OpenAI, Anthropic, and Google AI.
Every capability from auth to streaming to retrieval — production-ready and self-hostable.
Argon2id password hashing, JWT access + refresh tokens with 15-minute expiry. Role-based access with a seed admin.
Server-sent events over HTTP. Switch providers and models per-request. Full conversation history persisted in SQLite.
Upload txt, md, html, or PDF. Chunked and embedded into LanceDB. Vector search injected into every chat context.
Tavily-powered live search as a tool call. The model decides when to search, fetches results, and cites sources inline.
Users store their own provider API keys, encrypted AES-256-GCM at rest. Used automatically over the operator key.
Every request logged with provider, model, token counts, latency, and cost estimate. Full dashboard included.
All routes are JSON over HTTPS. Chat streams as server-sent events. Auth uses standard Bearer tokens.
# Stream a chat message
curl -X POST https://api.spellmansapi.com/chat \
-H "Authorization: Bearer <token>" \
-H "Accept: text/event-stream" \
-H "Content-Type: application/json" \
-d '{
"message": "Explain quantum entanglement",
"provider": "anthropic",
"model": "claude-3-5-sonnet-20241022"
}'
# SSE response:
data: {"type":"meta","conversationId":"..."}
data: {"type":"chunk","text":"Quantum "}
data: {"type":"done","usage":{...}}
Switch providers and models per-request. The API normalises streaming, tool calling, and token accounting across all of them.
GPT-4o · GPT-4o mini · o3-mini · text-embedding-3
Claude 3.5 Sonnet · Claude 3.5 Haiku · Claude 3 Opus
Gemini 2.0 Flash · Gemini 1.5 Pro · Gemini 1.5 Flash
Local models — llama3 · mistral · phi3