lazy_dataset.py

File Path: src/data/lazy_dataset.py

Purpose: A PyTorch Dataset implementation that loads individual .npz files on demand (Lazy Loading).

Overview

Unlike the Memory-Mapped dataset (which loads a massive monolithic file), this class keeps the data as thousands of individual small files. This is efficient for memory but incurs higher I/O overhead (many open() calls).

Class `LazyKArSLDataset`

Inherits: torch.utils.data.Dataset

`init`

Parameters: split, signers, signs, transforms. Logic:

Initializes TSNSampler and DataAugmentor.
Iterates over all requested signs and signers.
Checks for existence of .npz files in NPZ_KPS_DIR.
Builds a list of metadata tuples: self.samples = [(signer, vid_id, label), ...].

`_load_file(path)`

Decorator: @lru_cache(maxsize=1024) Purpose: Caches recently accessed file contents to reduce disk I/O for frequently accessed samples (though in efficient training, re-access is rare per epoch).

`getitem(index)`

Logic:

Retrieves metadata (signer, vid, label).
Constructs file path.
Loads raw keypoints via _load_file.
Sampling: Applies TSNSampler to get fixed-length SEQ_LEN.
Transform: Applies spatial augmentation.
Return: (FloatTensor, LongTensor).

Comparison

Feature	Lazy Dataset	MMap Dataset
Startup Time	Slow (File Scanning)	Fast (Offset Calc)
Memory Usage	Low	Low (Virtual Mem)
IO Pattern	Random Small Reads	Random Seek/Read
Flexibility	High (Add/Remove files)	Low (Rebuild MMap)

Depends On:

data_preparation.py - TSNSampler, DataAugmentor
constants.py - NPZ_KPS_DIR

Used By:

dataloader.py

Arabic Sign Language

Explorer

lazy_dataset.py

lazy_dataset.py

Overview

Class `LazyKArSLDataset`

`init`

`_load_file(path)`

`getitem(index)`

Comparison

Table of Contents

Graph View

Backlinks

Arabic Sign Language

Explorer

lazy_dataset.py

lazy_dataset.py

Overview

Class LazyKArSLDataset

__init__

_load_file(path)

__getitem__(index)

Comparison

Related Documentation

Table of Contents

Graph View

Backlinks

Class `LazyKArSLDataset`

`init`

`_load_file(path)`

`getitem(index)`