Data Preparation Pipeline
The data preparation pipeline transforms raw video data into a format suitable for training deep learning models. It involves landmark extraction, normalization, augmentation, and storage optimization.
Flowchart
graph LR A[Raw Video] -->|MediaPipe| B[Raw Keypoints .npz] B -->|Normalize| C[Normalized Keypoints] C -->|TSN Sampling| D[Fixed Sequence] D -->|Augment| E[Training Batch] C -->|Memory Map| F[mmap Dataset] F -->|Load| E
Pipeline Steps
1. Landmark Extraction
We use MediaPipe to extract landmarks from every frame of every video in the dataset.
- Script:
prepare_npz_kps.py - Output:
.npzfiles containing raw landmark coordinates.
2. Normalization
Raw landmarks are normalized to be invariant to scale and translation.
- Center: Subtract the nose (pose/face) or wrist (hand) coordinates.
- Scale: Divide by a reference distance (e.g., shoulder width).
3. Sampling (TSN)
Video sequences vary in length. We use Temporal Segment Networks (TSN) sampling to produce a fixed-length sequence (e.g., SEQ_LEN=30).
- Training: Randomly sample one frame from each segment (jittering) to augment temporal variance.
- Testing: Sample the center frame of each segment for deterministic evaluation.
4. Augmentation
To improve model generalization, we apply valid geometric transformations during training:
- Horizontal Flip: Mirrored with left/right landmark permutation.
- Rotation: Random rotation ±15 degrees.
- Scaling: Random scaling ±15%.
- Translation: Random shift in X/Y.