mediapipe_utils.py

File Path: src/core/mediapipe_utils.py

Purpose: Wrappers for MediaPipe solutions to extract Pose, Face, and Hand landmarks.

Overview

Handles the initialization and concurrent execution of MediaPipe’s PoseLandmarker, FaceLandmarker, and HandLandmarker. It normalizes the extraction process across different input types (Image vs Video).

Setup Logic

graph TD
    A[Init] --> B{Inference Mode?}
    B -->|True| C[RunningMode.VIDEO]
    B -->|False| D[RunningMode.IMAGE]
    C --> E[Load Task Files]
    D --> E
    E --> F[Create Landmarkers]

Key Features

Concurrent Extraction: Uses ThreadPoolExecutor to run landmarkers in parallel.
Normalization: Indices mapping for specific body parts.
Async Creation: Supports asynchronous initialization for non-blocking startup.

Constants

Keypoint Counts

POSE_NUM: 6 (Shoulders, Elbows, Wrists)
FACE_NUM: 468 (Full FaceMesh) + Iris
HAND_NUM: 21 (Standard Hand Model)

Slicing

KP2SLICE dictionary maps body parts to indices in the flattened feature vector:

pose: [0:POSE_NUM]
face: [POSE_NUM:POSE_NUM+FACE_NUM]
rh: [...:...+HAND_NUM]
lh: [...:...+HAND_NUM]

Classes

`LandmarkerProcessor`

Purpose: Singleton-like manager for MediaPipe instances.

`init`

Initializes logger and definition references.

`create(landmarkers, inference_mode)`

Parameters:

landmarkers: List of strings (e.g., ["pose", "face"]).
inference_mode: True for Video (stateful), False for Image (stateless). Returns: Initialized instance.

`extract_frame_keypoints(frame_rgb, timestamp_ms=-1, adjusted=False)`

Core Logic:

Defines nested functions (get_pose, get_face, get_hands) to capture local state.
Submits tasks to ThreadPoolExecutor(max_workers=3).
Waits for all results.
Aggregates results into a single (N, 4) numpy array [x, y, z, visibility].
Adjusted Mode: If True, normalizes points relative to a reference (e.g., Nose) and scales by a body-part metric (e.g., Shoulder width).

Depends On:

constants.py - LANDMARKERS_DIR.

Used By:

prepare_npz_kps.py - Data ingestion.
live_processing.py - Real-time inference.

Arabic Sign Language

Explorer

mediapipe_utils.py

mediapipe_utils.py

Overview

Setup Logic

Key Features

Constants

Keypoint Counts

Slicing

Classes

`LandmarkerProcessor`

`init`

`create(landmarkers, inference_mode)`

`extract_frame_keypoints(frame_rgb, timestamp_ms=-1, adjusted=False)`

Table of Contents

Graph View

Backlinks

Arabic Sign Language

Explorer

mediapipe_utils.py

mediapipe_utils.py

Overview

Setup Logic

Key Features

Constants

Keypoint Counts

Slicing

Classes

LandmarkerProcessor

__init__

create(landmarkers, inference_mode)

extract_frame_keypoints(frame_rgb, timestamp_ms=-1, adjusted=False)

Related Documentation

Table of Contents

Graph View

Backlinks

`LandmarkerProcessor`

`init`

`create(landmarkers, inference_mode)`

`extract_frame_keypoints(frame_rgb, timestamp_ms=-1, adjusted=False)`