Client Development¶
Guide for building new Omni clients that connect to the backend.
Protocol Overview¶
All Omni clients communicate with the backend via raw WebSocket (not Socket.IO).
Connection Flow¶
- Connect to
wss://your-backend/ws/live/{session_id} - Send an
authmessage with Firebase JWT token - Receive
auth_responsewith user info and capabilities - Start sending/receiving audio (binary) and control (JSON) messages
Auth Message¶
{
"type": "auth",
"token": "firebase-jwt-token",
"client_type": "desktop",
"capabilities": ["screen_capture", "file_system", "execute_command"],
"local_tools": [
{
"name": "capture_screen",
"description": "Capture a screenshot",
"parameters": { "type": "object", "properties": {} }
}
]
}
Message Types¶
| Direction | Type | Format | Description |
|---|---|---|---|
| Client → Server | Audio | Binary (PCM16) | 16kHz mono audio frames |
| Client → Server | Text | JSON {type: "text", text: "..."} | Text input |
| Server → Client | Audio | Binary (PCM24) | 24kHz audio response |
| Server → Client | Transcript | JSON | Agent text response |
| Server → Client | Tool invocation | JSON | T3 reverse-RPC call |
| Client → Server | Tool result | JSON | T3 tool execution result |
T3 Tool Registration¶
Clients can advertise local tools that the AI agent can invoke remotely. See the Desktop Client Architecture for a complete example.