Architecture Overview
architecture system-design components
This document provides a high-level overview of the Arabic Sign Language Recognition system architecture, component interactions, and data flow.
System Architecture
graph TB subgraph "Frontend (Browser)" A[User Camera] --> B[HTML5 Canvas] B --> C[WebSocket Client] C --> D[live-signs.js] end subgraph "Backend (FastAPI)" E[WebSocket Handler] --> F[Frame Buffer] F --> G[Motion Detector] G --> H[MediaPipe Processor] H --> I[Keypoint Extractor] I --> J[ONNX Inference] J --> K[Sign Classifier] end subgraph "Models & Data" L[ONNX Model] M[MediaPipe Models] N[Sign Labels] end C <-->|Binary Frames| E J --> L H --> M K --> N K -->|JSON Response| C style A fill:#e1f5ff style L fill:#ffe1e1 style M fill:#ffe1e1 style N fill:#ffe1e1
Component Overview
1. Frontend Layer
Technology: HTML5, CSS3, JavaScript (Vanilla)
Components:
- Camera Handler: Captures video frames from webcam
- WebSocket Client: Establishes real-time connection to backend
- UI Controller: Displays recognized signs and confidence scores
- Frame Encoder: Converts canvas frames to JPEG for transmission
Key Files:
- live-signs.js - Main client logic
- index.html - UI structure
- styles.css - Styling
See Web Interface Design for details.
2. API Layer
Technology: FastAPI, Uvicorn, WebSockets
Components:
- FastAPI Application: HTTP server and routing
- WebSocket Handler: Manages real-time frame processing
- CORS Middleware: Handles cross-origin requests
- Lifespan Manager: Model loading and cleanup
Key Files:
- main.py - Application setup and routes
- websocket.py - WebSocket handler
- run.py - Entry point
Functions:
lifespan()- Loads ONNX model on startupws_live_signs()- Main WebSocket handlerlive_signs_ui()- Serves frontend HTML
See FastAPI Application for details.
3. Processing Pipeline
Technology: OpenCV, MediaPipe, NumPy
Components:
Frame Buffer
Circular buffer for managing incoming frames during inference.
Key Class: FrameBuffer in live_processing.py
Methods:
add_frame()- Adds frame to bufferget_frame()- Retrieves frame by indexclear()- Resets buffer
Motion Detection
Detects movement to trigger sign recognition.
Key Class: MotionDetector in cv2_utils.py
Methods:
detect()- Compares consecutive framesconvert_small_gray()- Preprocesses frames
Keypoint Extraction
Extracts pose, face, and hand landmarks using MediaPipe.
Key Class: LandmarkerProcessor in mediapipe_utils.py
Methods:
extract_frame_keypoints()- Extracts all landmarksinit_mediapipe_landmarkers()- Initializes MediaPipe models
See MediaPipe Integration for details.
4. Model Layer
Technology: PyTorch, ONNX Runtime
Components:
Model Architecture
Attention-based Bidirectional LSTM for sequence classification.
Key Classes in model.py:
AttentionBiLSTM- Main model architectureSpatialGroupEmbedding- Feature embedding layerResidualBiLSTMBlock- BiLSTM building blockAttentionPooling- Attention-based pooling
Model Pipeline:
- Input: Keypoint sequences (batch, seq_len, features)
- Embedding: Spatial group embedding
- BiLSTM: 4 residual BiLSTM layers
- Attention: Multi-head self-attention
- Pooling: Attention-based temporal pooling
- Output: Class logits (502 classes)
See Model Architecture for details.
Inference Engine
ONNX Runtime for optimized CPU inference.
Key Functions in model.py:
load_onnx_model()- Loads ONNX modelonnx_inference()- Runs inference
5. Data Layer
Technology: PyTorch, NumPy, Pandas
Components:
Dataset Loaders
- LazyDataset: On-demand loading from NPZ files
- MmapDataset: Memory-mapped dataset for efficient access
Key Files:
Data Preparation
- Video preprocessing
- Keypoint extraction from videos
- Dataset splitting (train/val/test)
Key Files:
See Data Preparation Pipeline for details.
Data Flow
Real-Time Recognition Flow
sequenceDiagram participant User participant Browser participant WebSocket participant FrameBuffer participant MotionDetector participant MediaPipe participant ONNX participant Classifier User->>Browser: Perform sign Browser->>WebSocket: Send frame (JPEG) WebSocket->>FrameBuffer: Add frame loop Processing Loop FrameBuffer->>MotionDetector: Get latest frame MotionDetector->>MotionDetector: Detect motion alt Motion Detected MotionDetector->>MediaPipe: Extract keypoints MediaPipe->>FrameBuffer: Store keypoints alt Buffer >= MIN_FRAMES FrameBuffer->>ONNX: Keypoint sequence ONNX->>Classifier: Raw logits Classifier->>Classifier: Apply softmax Classifier->>Classifier: Check confidence alt Confidence > Threshold Classifier->>WebSocket: Send prediction WebSocket->>Browser: Display sign Browser->>User: Show result end end else No Motion MotionDetector->>WebSocket: Send idle status end end
Training Flow
graph LR A[Raw Videos] --> B[Video Preprocessing] B --> C[MediaPipe Extraction] C --> D[NPZ Keypoints] D --> E[Dataset Loader] E --> F[DataLoader] F --> G[Model Training] G --> H[PyTorch Checkpoint] H --> I[ONNX Export] I --> J[ONNX Model] J --> K[Production Inference] style A fill:#e1f5ff style J fill:#e1ffe1 style K fill:#ffe1e1
Configuration Management
Environment Variables
Managed through .env file:
ONNX_CHECKPOINT_FILENAME # Model filename
DOMAIN_NAME # CORS allowed origin
LOCAL_DEV # Local vs Kaggle paths
USE_CPU # Force CPU executionSee Environment Configuration for all options.
Constants
Defined in constants.py:
SEQ_LEN = 50 # Sequence length
FEAT_NUM = 184 # Number of features
FEAT_DIM = 4 # Feature dimensions (x, y, z, v)
DEVICE = "cpu" | "cuda" # Execution deviceDeployment Architecture
Docker Deployment
graph TB subgraph "Docker Container" A[Uvicorn Server] B[FastAPI App] C[ONNX Runtime] D[MediaPipe] E[Static Files] end F[Host Port 8000] --> A A --> B B --> C B --> D B --> E G[Volume: ./] --> B H[Volume: ./models] --> C I[Volume: ./landmarkers] --> D style A fill:#e1f5ff style G fill:#ffe1e1 style H fill:#ffe1e1 style I fill:#ffe1e1
Features:
- Hot reload enabled for development
- Volume mounts for code and models
- Automatic dependency installation
- Consistent environment across platforms
See Docker Setup for configuration.
Performance Considerations
Optimization Strategies
- ONNX Runtime: Optimized inference engine
- CPU Execution: Tuned for CPU performance
- Frame Buffering: Circular buffer prevents memory overflow
- Motion Detection: Reduces unnecessary processing
- Async Processing: Non-blocking WebSocket communication
- Thread Pool: Parallel keypoint extraction
Bottlenecks
- MediaPipe Processing: ~20-30ms per frame
- ONNX Inference: ~10-20ms per sequence
- Network Latency: WebSocket frame transmission
Security Considerations
- CORS: Configured allowed origins
- WebSocket: No authentication (add for production)
- Input Validation: Frame size and format checks
- Resource Limits: Frame buffer size limits
Scalability
Current Limitations
- Single-threaded WebSocket handler
- In-memory frame buffer
- No load balancing
Future Improvements
- Multi-worker deployment
- Redis for session management
- Load balancer for multiple instances
- GPU acceleration for inference
Related Documentation
Next Steps:
- Explore Live Processing Pipeline
- Learn about Training Process
- Review WebSocket Implementation