Architecture Overview

architecture system-design components

This document provides a high-level overview of the Arabic Sign Language Recognition system architecture, component interactions, and data flow.

System Architecture

graph TB
    subgraph "Frontend (Browser)"
        A[User Camera] --> B[HTML5 Canvas]
        B --> C[WebSocket Client]
        C --> D[live-signs.js]
    end
    
    subgraph "Backend (FastAPI)"
        E[WebSocket Handler] --> F[Frame Buffer]
        F --> G[Motion Detector]
        G --> H[MediaPipe Processor]
        H --> I[Keypoint Extractor]
        I --> J[ONNX Inference]
        J --> K[Sign Classifier]
    end
    
    subgraph "Models & Data"
        L[ONNX Model]
        M[MediaPipe Models]
        N[Sign Labels]
    end
    
    C <-->|Binary Frames| E
    J --> L
    H --> M
    K --> N
    K -->|JSON Response| C
    
    style A fill:#e1f5ff
    style L fill:#ffe1e1
    style M fill:#ffe1e1
    style N fill:#ffe1e1

Component Overview

1. Frontend Layer

Technology: HTML5, CSS3, JavaScript (Vanilla)

Components:

Camera Handler: Captures video frames from webcam
WebSocket Client: Establishes real-time connection to backend
UI Controller: Displays recognized signs and confidence scores
Frame Encoder: Converts canvas frames to JPEG for transmission

Key Files:

live-signs.js - Main client logic
index.html - UI structure
styles.css - Styling

See Web Interface Design for details.

2. API Layer

Technology: FastAPI, Uvicorn, WebSockets

Components:

FastAPI Application: HTTP server and routing
WebSocket Handler: Manages real-time frame processing
CORS Middleware: Handles cross-origin requests
Lifespan Manager: Model loading and cleanup

Key Files:

main.py - Application setup and routes
websocket.py - WebSocket handler
run.py - Entry point

Functions:

lifespan() - Loads ONNX model on startup
ws_live_signs() - Main WebSocket handler
live_signs_ui() - Serves frontend HTML

See FastAPI Application for details.

3. Processing Pipeline

Technology: OpenCV, MediaPipe, NumPy

Components:

Frame Buffer

Circular buffer for managing incoming frames during inference.

Key Class: FrameBuffer in live_processing.py

Methods:

add_frame() - Adds frame to buffer
get_frame() - Retrieves frame by index
clear() - Resets buffer

Motion Detection

Detects movement to trigger sign recognition.

Key Class: MotionDetector in cv2_utils.py

Methods:

detect() - Compares consecutive frames
convert_small_gray() - Preprocesses frames

Keypoint Extraction

Extracts pose, face, and hand landmarks using MediaPipe.

Key Class: LandmarkerProcessor in mediapipe_utils.py

Methods:

extract_frame_keypoints() - Extracts all landmarks
init_mediapipe_landmarkers() - Initializes MediaPipe models

See MediaPipe Integration for details.

4. Model Layer

Technology: PyTorch, ONNX Runtime

Components:

Model Architecture

Attention-based Bidirectional LSTM for sequence classification.

Key Classes in model.py:

AttentionBiLSTM - Main model architecture
SpatialGroupEmbedding - Feature embedding layer
ResidualBiLSTMBlock - BiLSTM building block
AttentionPooling - Attention-based pooling

Model Pipeline:

Input: Keypoint sequences (batch, seq_len, features)
Embedding: Spatial group embedding
BiLSTM: 4 residual BiLSTM layers
Attention: Multi-head self-attention
Pooling: Attention-based temporal pooling
Output: Class logits (502 classes)

See Model Architecture for details.

Inference Engine

ONNX Runtime for optimized CPU inference.

Key Functions in model.py:

load_onnx_model() - Loads ONNX model
onnx_inference() - Runs inference

5. Data Layer

Technology: PyTorch, NumPy, Pandas

Components:

Dataset Loaders

LazyDataset: On-demand loading from NPZ files
MmapDataset: Memory-mapped dataset for efficient access

Key Files:

Data Preparation

Video preprocessing
Keypoint extraction from videos
Dataset splitting (train/val/test)

Key Files:

See Data Preparation Pipeline for details.

Data Flow

Real-Time Recognition Flow

sequenceDiagram
    participant User
    participant Browser
    participant WebSocket
    participant FrameBuffer
    participant MotionDetector
    participant MediaPipe
    participant ONNX
    participant Classifier
    
    User->>Browser: Perform sign
    Browser->>WebSocket: Send frame (JPEG)
    WebSocket->>FrameBuffer: Add frame
    
    loop Processing Loop
        FrameBuffer->>MotionDetector: Get latest frame
        MotionDetector->>MotionDetector: Detect motion
        
        alt Motion Detected
            MotionDetector->>MediaPipe: Extract keypoints
            MediaPipe->>FrameBuffer: Store keypoints
            
            alt Buffer >= MIN_FRAMES
                FrameBuffer->>ONNX: Keypoint sequence
                ONNX->>Classifier: Raw logits
                Classifier->>Classifier: Apply softmax
                Classifier->>Classifier: Check confidence
                
                alt Confidence > Threshold
                    Classifier->>WebSocket: Send prediction
                    WebSocket->>Browser: Display sign
                    Browser->>User: Show result
                end
            end
        else No Motion
            MotionDetector->>WebSocket: Send idle status
        end
    end

Training Flow

graph LR
    A[Raw Videos] --> B[Video Preprocessing]
    B --> C[MediaPipe Extraction]
    C --> D[NPZ Keypoints]
    D --> E[Dataset Loader]
    E --> F[DataLoader]
    F --> G[Model Training]
    G --> H[PyTorch Checkpoint]
    H --> I[ONNX Export]
    I --> J[ONNX Model]
    J --> K[Production Inference]
    
    style A fill:#e1f5ff
    style J fill:#e1ffe1
    style K fill:#ffe1e1

Configuration Management

Environment Variables

Managed through .env file:

ONNX_CHECKPOINT_FILENAME  # Model filename
DOMAIN_NAME               # CORS allowed origin
LOCAL_DEV                 # Local vs Kaggle paths
USE_CPU                   # Force CPU execution

See Environment Configuration for all options.

Constants

Defined in constants.py:

SEQ_LEN = 50              # Sequence length
FEAT_NUM = 184            # Number of features
FEAT_DIM = 4              # Feature dimensions (x, y, z, v)
DEVICE = "cpu" | "cuda"   # Execution device

Deployment Architecture

Docker Deployment

graph TB
    subgraph "Docker Container"
        A[Uvicorn Server]
        B[FastAPI App]
        C[ONNX Runtime]
        D[MediaPipe]
        E[Static Files]
    end
    
    F[Host Port 8000] --> A
    A --> B
    B --> C
    B --> D
    B --> E
    
    G[Volume: ./] --> B
    H[Volume: ./models] --> C
    I[Volume: ./landmarkers] --> D
    
    style A fill:#e1f5ff
    style G fill:#ffe1e1
    style H fill:#ffe1e1
    style I fill:#ffe1e1

Features:

Hot reload enabled for development
Volume mounts for code and models
Automatic dependency installation
Consistent environment across platforms

See Docker Setup for configuration.

Performance Considerations

Optimization Strategies

ONNX Runtime: Optimized inference engine
CPU Execution: Tuned for CPU performance
Frame Buffering: Circular buffer prevents memory overflow
Motion Detection: Reduces unnecessary processing
Async Processing: Non-blocking WebSocket communication
Thread Pool: Parallel keypoint extraction

Bottlenecks

MediaPipe Processing: ~20-30ms per frame
ONNX Inference: ~10-20ms per sequence
Network Latency: WebSocket frame transmission

Security Considerations

CORS: Configured allowed origins
WebSocket: No authentication (add for production)
Input Validation: Frame size and format checks
Resource Limits: Frame buffer size limits

Scalability

Current Limitations

Single-threaded WebSocket handler
In-memory frame buffer
No load balancing

Future Improvements

Multi-worker deployment
Redis for session management
Load balancer for multiple instances
GPU acceleration for inference

Next Steps:

Arabic Sign Language

Explorer

Architecture Overview

Architecture Overview

System Architecture

Component Overview

1. Frontend Layer

2. API Layer

3. Processing Pipeline

Frame Buffer

Motion Detection

Keypoint Extraction

4. Model Layer

Model Architecture

Inference Engine

5. Data Layer

Dataset Loaders

Data Preparation

Data Flow

Real-Time Recognition Flow

Training Flow

Configuration Management

Environment Variables

Constants

Deployment Architecture

Docker Deployment

Performance Considerations

Optimization Strategies

Bottlenecks

Security Considerations

Scalability

Current Limitations

Future Improvements

Table of Contents

Graph View

Backlinks

Arabic Sign Language

Explorer

Architecture Overview

Architecture Overview

System Architecture

Component Overview

1. Frontend Layer

2. API Layer

3. Processing Pipeline

Frame Buffer

Motion Detection

Keypoint Extraction

4. Model Layer

Model Architecture

Inference Engine

5. Data Layer

Dataset Loaders

Data Preparation

Data Flow

Real-Time Recognition Flow

Training Flow

Configuration Management

Environment Variables

Constants

Deployment Architecture

Docker Deployment

Performance Considerations

Optimization Strategies

Bottlenecks

Security Considerations

Scalability

Current Limitations

Future Improvements

Related Documentation

Table of Contents

Graph View

Backlinks