feat: add person localization and video cropping to video pipeline by brian10420 · Pull Request #1 · balaboom123/Sign-Language-Preprocessing

brian10420 · 2026-02-26T04:55:39Z

Summary

Adds two new pipeline processors to reduce background noise in video-mode datasets by detecting and cropping to the signer before downstream processing.

New Processors

`person_localize`

Uses YOLOv8-nano to detect the signer's bounding box across sampled frames
Writes BBOX_X1, BBOX_Y1, BBOX_X2, BBOX_Y2, PERSON_DETECTED columns to the manifest
Two configurable sampling strategies:
- skip_frame (default): mirrors the extract processor's frame sampling logic
- uniform: evenly samples N frames across the segment
Gracefully falls back to full-frame bbox when no person is detected
Resume-safe: skips already-processed rows on re-run

`crop_video`

Reads bbox columns from manifest and crops each clip using ffmpeg
Configurable padding (default 25%) clamped to frame boundaries
Ensures even pixel dimensions for libx264 compatibility
Stream-copy (no re-encode) for fallback segments
Outputs to paths.cropped_clips (separate from paths.clips)

Pipeline Integration

# configs/how2sign/video.yaml
pipeline:
  steps: [person_localize, clip_video, crop_video, webdataset]

webdataset automatically uses cropped_clips when available.

Bug Fixes

Windows ffmpeg path: clip_video and crop_video now use shutil.which("ffmpeg") to resolve the binary path in worker processes, fixing [WinError 2] on Windows
Windows webdataset path: replaced wds.ShardWriter with a tarfile-based _ShardWriter to bypass gopen() misinterpreting Windows drive letters (D:/) as URL schemes
Windows POSIX paths: resolve_paths() now correctly treats /abs/path as absolute on Windows

Validation

Tested on How2Sign val set (132 videos, 1741 segments):

person_localize : 1739/1741 detected, 2 fallback, 0 errors
clip_video      : 1741/1741 success
crop_video      : 1739 cropped + 2 stream-copied, 0 errors
webdataset      : 1741 samples → shard-000000.tar

Tests

164 tests passing. New test classes:

TestSampleStrategy — skip_frame / uniform sampling logic
TestPersonLocalizeManifest — manifest column writing and resume safety
TestCropGeometry — bbox padding, clamp, even dimensions
TestSchemaDefaults — new config fields and validation

balaboom123

# PR Review — Confirmed + Additional Issues Found

---

## [Issue 1] Redundant and Duplicated Config Parameters

### 1a — frame_skip is declared twice with the same default and same semantic meaning.

ProcessingConfig (already existing) already has:

```python
# schema.py
class ProcessingConfig(BaseModel):
    frame_skip: int = 2   # ← already exists

The PR adds a second frame_skip in PersonLocalizeConfig:

class PersonLocalizeConfig(BaseModel):
    frame_skip: int = 2   # ← duplicate

And both appear in _base/video.yaml:

processing:
  frame_skip: 2          # ← existing

person_localize:
  frame_skip: 2          # ← new duplicate

These two fields can silently diverge. A user who sets processing.frame_skip: 4 to speed up pose extraction will reasonably expect localization to match — but it won't, because person_localize has its own independent value. The person_localize step should read from cfg.processing.frame_skip instead of maintaining its own copy.

1b — Commented-out override blocks in dataset configs repeat the exact defaults verbatim.

In configs/how2sign/video.yaml and configs/youtube_asl/video.yaml:

# Uncomment to override:
# person_localize:
#   model: "yolov8n.pt"          ← same as _base default
#   confidence_threshold: 0.5    ← same as _base default
#   sample_frames: 5             ← same as _base default
#   device: "cuda:0"             ← same as _base default
#   min_bbox_area: 0.05          ← same as _base default

Overrides should only show the fields that are actually meaningful to change per-dataset, with example non-default values. As written, a user who uncomments the block changes nothing — it gives a false impression that something has been configured.

[Issue 2] Hardcoded YOLO Backend — No Dispatch Logic for Future Model Support

In person_localize.py, the model is always loaded through ultralytics.YOLO, regardless of what string is in config.person_localize.model:

# person_localize.py
model = YOLO(loc_cfg.model)   # always ultralytics — no dispatch
model.to(loc_cfg.device)

The model field in PersonLocalizeConfig is typed as a bare str, which suggests it could point to any model file, but the implementation has no branching to support anything other than ultralytics-format weights. Compare this to how the existing extractor pattern works — ExtractorConfig has a name field and the runner dispatches based on it to mediapipe or mmpose backends via the registry.

The fix should follow the same pattern: add a backend: Literal["ultralytics"] = "ultralytics" field (or similar), and wrap the model loading in a dispatch block so that adding a new detector backend in the future (e.g., MMDetection, Torchvision) only requires adding a new branch, not rewriting the processor:

# Suggested direction
if loc_cfg.backend == "ultralytics":
    from ultralytics import YOLO
    model = YOLO(loc_cfg.model)
elif loc_cfg.backend == "mmdet":
    ...  # future support
else:
    raise ValueError(f"Unknown detector backend: {loc_cfg.backend}")

[Issue 3] smoke_test_localize.py Added Without Integration into scripts/run.py

scripts/run.py is the project's documented single entry point:

# scripts/run.py — unchanged by this PR
from sign_prep.__main__ import main

if __name__ == "__main__":
    main()

The PR adds scripts/smoke_test_localize.py as a completely standalone script with its own argument parser, its own Config construction, and its own processor invocations — none of which flow through run.py or the CLI. This creates two problems:

Discoverability — users reading scripts/run.py have no way to know the smoke test exists.
Maintenance drift — the smoke test builds a Config object manually with hardcoded fields that can go stale as the schema evolves, whereas run.py goes through load_config() which handles validation and defaults centrally.

At minimum, the smoke test should be referenced in a comment in run.py or the README. Better yet, consider exposing it as a subcommand:

python scripts/run.py --smoke-test --video path/to/video.mp4

[Issue 4 — New] ultralytics Not Added to requirements.txt

The PR introduces a hard dependency on ultralytics (YOLOv8) but does not add it to requirements.txt:

# requirements.txt — ultralytics is absent
torch==2.0.1
opencv-python==4.11.0.86
mediapipe==0.10.21
...
# ultralytics — MISSING

The code tries to handle this gracefully with a module-level try/except:

try:
    from ultralytics import YOLO
except ImportError:
    YOLO = None

But this only delays the error to runtime. A fresh pip install -r requirements.txt will produce a broken environment where person_localize silently sets YOLO = None and then raises ImportError only when the step actually runs. The package should be added to requirements.txt — optionally with a comment marking it as required only for the person_localize step.

[Issue 5 — New] Resume Safety is Overstated — Manifest Written Only Once at the End

The PR description claims resume safety as a feature:

Resume-safe: skips already-processed rows on re-run

However, looking at the implementation:

# person_localize.py — manifest is saved ONCE at the very end
for idx, row in tqdm(pending.iterrows(), ...):
    ...  # process all rows
    results_map[idx] = {...}

# Written only after all rows are done
data.to_csv(manifest_path, sep="\t", index=False)

If the process crashes (OOM, power loss, keyboard interrupt) after processing 1,700 of 1,741 segments, results_map is lost entirely and the next run starts from zero. True resume safety requires writing to the manifest periodically (e.g., every N rows or every video), not only on successful completion.

[Issue 6 — New] _ShardWriter Double _next_shard() Possible Bug

In the new _ShardWriter, both rollover conditions are checked independently in the same write() call:

def write(self, sample: dict):
    if self._count >= self.max_count:
        self._next_shard()          # opens shard N+1
    if self.max_size and self._size >= self.max_size:
        self._next_shard()          # opens shard N+2 — skips N+1 entirely!
    ...

If both conditions are true simultaneously (a large final sample that exactly fills both count and size), _next_shard() is called twice. The first call closes shard N and opens shard N+1 (empty). The second call immediately closes the empty shard N+1 and opens shard N+2. The result is a zero-byte shard file in the output directory which may confuse WebDataset readers. Use elif or merge the two checks into a single condition.

[Issue 7 — New] device: "cuda:0" Hardcoded in Base Config — No CPU Fallback

# configs/_base/video.yaml
person_localize:
  device: "cuda:0"   # hardcoded — fails on CPU-only machines and macOS MPS

Every other config in this project that takes a device argument either defaults to "cpu" or uses a softer default. A user running the default config on a CPU-only machine (CI, cloud VM, macOS without CUDA) will get a CUDA device error immediately on step start. The base config should default to "cpu" for safety, with a comment suggesting "cuda:0" or "mps" for GPU environments.

[Issue 8 — New] sample_frames Has Ambiguous Dual Meaning

class PersonLocalizeConfig(BaseModel):
    sample_frames: int = 5   # uniform: exact count; skip_frame: max frame count

The same field name controls two fundamentally different behaviours depending on sample_strategy:

In uniform mode: exactly 5 frames are sampled, always.
In skip_frame mode: up to 5 frames are sampled, but the actual count depends on segment duration and frame_skip.

This makes sample_frames difficult to reason about. Tuning one strategy silently changes the cap for the other. Consider splitting into uniform_frames: int = 5 and max_frames: int = 5, or at minimum add a @field_validator that emits a warning when the field is set without specifying sample_strategy.

Summary Table

#	Source	Severity	Issue
1a	User	Medium	frame_skip duplicated in ProcessingConfig and PersonLocalizeConfig
1b	User	Low	Commented-out override blocks repeat base defaults verbatim
2	User	Medium	No backend dispatch — YOLO always used with no future extensibility
3	User	Low	smoke_test_localize.py not discoverable from scripts/run.py
4	New	High	ultralytics missing from requirements.txt
5	New	High	Manifest written once at end — crash loses all progress, resume safety overstated
6	New	Medium	_ShardWriter double _next_shard() when count and size limits fire together
7	New	Medium	device: "cuda:0" default breaks CPU-only environments
8	New	Low	sample_frames has ambiguous dual meaning across strategies

brian10420 added 2 commits February 26, 2026 12:49

feat: add person localization and crop video processors

08f418d

feat: update configs and tests for person_localize and crop_video

30e26e3

balaboom123 self-requested a review February 26, 2026 09:02

balaboom123 requested changes Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add person localization and video cropping to video pipeline#1

feat: add person localization and video cropping to video pipeline#1
brian10420 wants to merge 2 commits intomainfrom
feature/person-localize-crop-video

brian10420 commented Feb 26, 2026

Uh oh!

balaboom123 left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brian10420 commented Feb 26, 2026

Summary

New Processors

person_localize

crop_video

Pipeline Integration

Bug Fixes

Validation

Tests

Uh oh!

balaboom123 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

1b — Commented-out override blocks in dataset configs repeat the exact defaults verbatim.

[Issue 2] Hardcoded YOLO Backend — No Dispatch Logic for Future Model Support

[Issue 3] smoke_test_localize.py Added Without Integration into scripts/run.py

[Issue 4 — New] ultralytics Not Added to requirements.txt

[Issue 5 — New] Resume Safety is Overstated — Manifest Written Only Once at the End

[Issue 6 — New] _ShardWriter Double _next_shard() Possible Bug

[Issue 7 — New] device: "cuda:0" Hardcoded in Base Config — No CPU Fallback

[Issue 8 — New] sample_frames Has Ambiguous Dual Meaning

Summary Table

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`person_localize`

`crop_video`

balaboom123 left a comment •

edited

Loading