WebSocket Communication Protocol
Real-time sign language recognition relies on a persistent, low-latency communication channel between the web client and the inference server. This is achieved using the WebSocket Protocol.
Protocol Overview
The communication is bidirectional but primarily driven by the client sending video frames and the server responding with inference results.
Sequence Diagram
sequenceDiagram participant C as Client (Browser) participant S as Server (FastAPI) participant M as Model (ONNX) C->>S: WebSocket Connect S-->>C: Accept loop Video Stream C->>S: Binary Frame (JPEG) S->>S: Decode & Analyze Motion alt Motion Detected S->>S: Buffer Frame alt Buffer Full / Sequence Ready S->>M: Run Inference (Async) M-->>S: Sign ID S-->>C: JSON Prediction "SignName" end end end
Connection Lifecycle
- Handshake: The client initiates a connection to
/live-signs. - State Initialization: The server initializes a session-specific state, including:
last_inference_time: Timestamp of the last processed frame.sign_frames: A ring buffer to store incoming frames for the temporal model.motion_detected: Boolean flag to trigger inference.
- Data Loop: The server enters an infinite loop waiting for messages (frames).
- Termination: The connection is closed when the client disconnects or an error occurs.
Message Format
Client → Server
Messages sent by the client are binary blobs containing JPEG-encoded images.
- Format: binary (bytes)
- Content: Compressed image frame from the user’s camera.
- Frequency: Frame rate of the client’s camera (e.g., 30 FPS).
Server → Client
Messages sent by the server are JSON text strings containing predictions or status updates.
Prediction Payload
{
"type": "prediction",
"payload": "HELLO",
"probability": 0.95,
"inference_time": 0.045
}Status Payload (Example)
{
"type": "status",
"message": "Motion Detected",
"code": "MOTION_START"
}Concurrency and Performance
To maintain real-time performance, the server utilizes asynchronous programming (asyncio) combined with a Thread Pool.
- Async/Await: Used for handling WebSocket I/O (receiving frames, sending responses) without blocking the event loop.
- Thread Pool (
run_in_executor): CPU-intensive tasks like image decoding (cv2.imdecode), preprocessing, and model inference are offloaded to a thread pool. This prevents the main event loop from freezing, ensuring the server remains responsive to ping/pong frames and other connections.