Datasets

YouTube-ASL

A large-scale, open-domain ASL-English parallel corpus with 11,000+ YouTube videos and 73,000+ segments (Uthus et al., 2023).

Default pose job: dataset.download → dataset.manifest → processing.video2pose → post_processing.normalize → output.webdataset

python -m signdata run configs/jobs/youtube_asl/mediapipe.yaml

Default video job: dataset.download → dataset.manifest → processing.video2crop → output.webdataset

python -m signdata run configs/jobs/youtube_asl/video.yaml

Requires dataset.source.video_ids_file pointing to the video ID list (included at assets/youtube-asl_youtube_asl_video_ids.txt). The dataset download stage fetches videos via yt-dlp and transcripts via youtube-transcript-api. If transcript requests start failing with RequestBlocked or IpBlocked, configure dataset.source.transcript_proxy_http / dataset.source.transcript_proxy_https or retry from a non-blocked residential IP.

How2Sign

80+ hours of instructional "how-to" videos with continuous ASL, recorded in a controlled environment with professional signers (Duarte et al., CVPR 2021).

Default pose job: dataset.download (validation only) → dataset.manifest → processing.video2pose → post_processing.normalize → output.webdataset

python -m signdata run configs/jobs/how2sign/mediapipe.yaml

Default video job: dataset.download (validation only) → dataset.manifest → processing.video2crop → output.webdataset

python -m signdata run configs/jobs/how2sign/video.yaml

Setup:

Download the dataset from how2sign.github.io
Place videos in the videos path (default: dataset/how2sign/videos/)
Place the alignment CSV (e.g. how2sign_realigned_val.csv) at paths.manifest or dataset.source.manifest_csv

The How2Sign dataset adapter uses dataset.download as a validation step for local files; it does not fetch remote data.

Adding a New Dataset

All datasets must use the package layout src/signdata/datasets/<dataset_name>/ with adapter.py, source.py, and manifest.py as the default entry files.

See CONTRIBUTING.md for the required structure, responsibilities, and code template.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets

YouTube-ASL

How2Sign

Adding a New Dataset

See Also

FilesExpand file tree

datasets.md

Latest commit

History

datasets.md

File metadata and controls

Datasets

YouTube-ASL

How2Sign

Adding a New Dataset

See Also