Architecture Overview
architecture system-design components
This document provides a high-level overview of the Arabic Sign Language Recognition system architecture, component interactions, and data flow.
System Architecture
graph TB subgraph "Frontend (Browser)" A[User Camera] --> B[HTML5 Canvas] B --> C[WebSocket Client] C --> D[live-signs.js] end subgraph "Backend (FastAPI)" E[WebSocket Handler] --> F[Frame Buffer] F --> G[Motion Detector] G --> H[MediaPipe Processor] H --> I[Keypoint Extractor] I --> J[ONNX Inference] J --> K[Sign Classifier] end subgraph "Models & Data" L[ONNX Model] M[MediaPipe Models] N[Sign Labels] end C <-->|Binary Frames| E J --> L H --> M K --> N K -->|JSON Response| C style A fill:#e1f5ff style L fill:#ffe1e1 style M fill:#ffe1e1 style N fill:#ffe1e1
Component Overview
1. Frontend Layer
Technology: HTML5, CSS3, JavaScript (Vanilla)
Components:
- Camera Handler: Captures video frames from webcam
- WebSocket Client: Establishes real-time connection to backend
- UI Controller: Displays recognized signs, confidence scores, and skeletal visualizations.
- Frame Encoder: Converts canvas frames to JPEG for transmission, including optimization flags.
- Visualization Engine: Renders body-region specific landmarks and connections.
Key Files:
- live-signs.js - Main client logic
- index.html - UI structure
- styles.css - Styling
See Web Interface Design for details.
2. API Layer
Technology: FastAPI, Uvicorn, WebSockets
Components:
- FastAPI Application: HTTP server and routing
- WebSocket Handler: Manages real-time frame processing
- CORS Middleware: Handles cross-origin requests
- Lifespan Manager: Model loading and cleanup
Key Files:
- main.py - Application setup and routes
- websocket.py - WebSocket handler
- run.py - Entry point
Functions:
lifespan()- Loads ONNX model on startupws_live_signs()- Main WebSocket handlerlive_signs_ui()- Serves frontend HTML
See FastAPI Application for details.
3. Processing Pipeline
Technology: OpenCV, MediaPipe, NumPy
Components:
Frame Buffer
Asynchronous queue for managing incoming frames between the producer and consumer tasks.
Key Class: asyncio.Queue (integrated in live_processing.py)
Methods:
add_frame()- Adds frame to bufferget_frame()- Retrieves frame by indexclear()- Resets buffer
Motion Detection
Detects movement to trigger sign recognition.
Key Class: MotionDetector in cv2_utils.py
Methods:
detect()- Compares consecutive framesconvert_small_gray()- Preprocesses frames
Keypoint Extraction
Extracts pose, face, and hand landmarks using MediaPipe.
Key Class: LandmarkerProcessor in mediapipe_utils.py
Methods:
extract_frame_keypoints()- Extracts all landmarksinit_mediapipe_landmarkers()- Initializes MediaPipe models
See MediaPipe Integration for details.
4. Model Layer
Technology: PyTorch, ONNX Runtime
Components:
Model Architecture
Spatial-Temporal Transformer (ST-Transformer) for sequence classification.
Key Classes in model.py:
STTransformer- Main model architectureGroupTokenEmbedding- Body part tokenization layerSTTransformerBlock- Spatial-Temporal dual attention blockAttentionPooling- Attention-based temporal aggregation
Model Pipeline:
- Input: Keypoint sequences (batch, seq_len, features)
- Embedding: Group token embedding (4 tokens: Pose, Face, Hands)
- Positioning: Sinusoidal positional encoding
- Transformer: N consecutive Spatial-Temporal attention blocks
- Pooling: Attention-based temporal pooling
- Output: Class logits (502 classes)
See Model Architecture for details.
Inference Engine
ONNX Runtime for optimized CPU inference.
Key Functions in model.py:
load_onnx_model()- Loads ONNX modelonnx_inference()- Runs inference
5. Data Layer
Technology: PyTorch, NumPy, Pandas
Components:
Dataset Loaders
- LazyDataset: On-demand loading from NPZ files
- MmapDataset: Memory-mapped dataset for efficient access
Key Files:
Data Preparation
- Video preprocessing
- Keypoint extraction from videos
- Dataset splitting (train/val/test)
Key Files:
See Data Preparation Pipeline for details.
Data Flow
Real-Time Recognition Flow
sequenceDiagram participant User participant Browser participant WebSocket as WS Router participant Producer as Producer Handler participant Queue as asyncio.Queue participant Consumer as Consumer Handler participant MediaPipe participant ONNX User->>Browser: Perform sign Browser->>WebSocket: Connect WS WebSocket->>Queue: Initialize (max_size=50) WebSocket->>Producer: Spawn Process WebSocket->>Consumer: Spawn Process loop Stream Browser->>Producer: Send Frame (JPEG) Producer->>Queue: Put (Decoded Frame) Queue->>Consumer: Get Frame Consumer->>Consumer: Motion Detection alt Motion Detected Consumer->>MediaPipe: Extract Keypoints Consumer->>Consumer: Buffer Keypoints alt Keypoints >= 15 Consumer->>ONNX: Run Inference ONNX-->>Consumer: Logits Consumer->>Browser: Send Prediction (JSON) end else No Motion Consumer->>Browser: Send Idle Status end end
Training Flow
graph LR A[Raw Videos] --> B[Video Preprocessing] B --> C[MediaPipe Extraction] C --> D[NPZ Keypoints] D --> E[Dataset Loader] E --> F[DataLoader] F --> G[Model Training] G --> H[PyTorch Checkpoint] H --> I[ONNX Export] I --> J[ONNX Model] J --> K[Production Inference] style A fill:#e1f5ff style J fill:#e1ffe1 style K fill:#ffe1e1
Configuration Management
Environment Variables
Managed through .env file:
ONNX_CHECKPOINT_FILENAME # Model filename
DOMAIN_NAME # CORS allowed origin
LOCAL_DEV # Local vs Kaggle paths
USE_CPU # Force CPU executionSee Environment Configuration for all options.
Constants
Defined in constants.py:
SEQ_LEN = 50 # Sequence length
FEAT_NUM = 184 # Number of features
FEAT_DIM = 4 # Feature dimensions (x, y, z, v)
DEVICE = "cpu" | "cuda" # Execution deviceDeployment Architecture
Docker Deployment
graph TB subgraph "Docker Container" A[Uvicorn Server] B[FastAPI App] C[ONNX Runtime] D[MediaPipe] E[Static Files] end F[Host Port 8000] --> A A --> B B --> C B --> D B --> E G[Volume: ./] --> B H[Volume: ./models] --> C I[Volume: ./landmarkers] --> D style A fill:#e1f5ff style G fill:#ffe1e1 style H fill:#ffe1e1 style I fill:#ffe1e1
Features:
- Hot reload enabled for development
- Volume mounts for code and models
- Automatic dependency installation
- Consistent environment across platforms
See Docker Setup for configuration.
Performance Considerations
Performance Considerations
- ONNX Runtime: Inference engine for CPU-bound environments.
- CPU Execution: Configured for hardware without GPU acceleration.
- Frame Buffering: Asynchronous queue management to prevent memory exhaustion.
- Motion Detection: Frame differencing to reduce processing load during idle periods.
- Async Processing: Non-blocking concurrency for client-server communication.
- Thread Pool: Parallel execution for compute-intensive keypoint extraction.
Bottlenecks
- MediaPipe Processing: ~20-30ms per frame
- ONNX Inference: ~10-20ms per sequence
- Network Latency: WebSocket frame transmission
Security Considerations
- CORS: Configured allowed origins
- WebSocket: No authentication (add for production)
- Input Validation: Frame size and format checks
- Resource Limits: Frame buffer size limits
Scalability
Current Limitations
- Single-threaded WebSocket handler
- In-memory frame buffer
- No load balancing
Future Improvements
- Multi-worker deployment
- Redis for session management
- Load balancer for multiple instances
- GPU acceleration for inference
Related Documentation
Next Steps:
- Explore Live Processing Pipeline
- Learn about Training Process
- Review WebSocket Implementation