Data Preparation Pipeline

data pipeline preprocessing

The data preparation pipeline transforms raw video data into a format suitable for training deep learning models. It involves landmark extraction, normalization, augmentation, and storage optimization.

Flowchart

graph LR
    A[Raw Video] -->|MediaPipe| B[Raw Keypoints .npz]
    B -->|Normalize| C[Normalized Keypoints]
    C -->|TSN Sampling| D[Fixed Sequence]
    D -->|Augment| E[Training Batch]
    C -->|Memory Map| F[mmap Dataset]
    F -->|Load| E

Pipeline Steps

1. Landmark Extraction

We use MediaPipe to extract landmarks from every frame of every video in the dataset.

Script: prepare_npz_kps.py
Output: .npz files containing raw landmark coordinates.

2. Normalization

Raw landmarks are normalized to be invariant to scale and translation.

Center: Subtract the nose (pose/face) or wrist (hand) coordinates.
Scale: Divide by a reference distance (e.g., shoulder width).

3. Sampling (TSN)

Video sequences vary in length. We use Temporal Segment Networks (TSN) sampling to produce a fixed-length sequence (e.g., SEQ_LEN=30).

Training: Randomly sample one frame from each segment (jittering) to augment temporal variance.
Testing: Sample the center frame of each segment for deterministic evaluation.

4. Augmentation

To improve model generalization, we apply valid geometric transformations during training:

Horizontal Flip: Mirrored with left/right landmark permutation.
Rotation: Random rotation ±15 degrees.
Scaling: Random scaling ±15%.
Translation: Random shift in X/Y.

Arabic Sign Language

Explorer

Data Preparation Pipeline

Data Preparation Pipeline

Flowchart

Pipeline Steps

1. Landmark Extraction

2. Normalization

3. Sampling (TSN)

4. Augmentation

Table of Contents

Graph View

Backlinks

Arabic Sign Language

Explorer

Data Preparation Pipeline

Data Preparation Pipeline

Flowchart

Pipeline Steps

1. Landmark Extraction

2. Normalization

3. Sampling (TSN)

4. Augmentation

Related Documentation

Table of Contents

Graph View

Backlinks