Skip to content

Add async dataset deduplication support

Latest

Choose a tag to compare

@edwinpav edwinpav released this 06 May 20:09
3321deb

Summary

This release adds async deduplication support for Nucleus datasets.

Users can now start a background deduplication job with Dataset.deduplicate() or Dataset.deduplicate_by_ids(), then collect structured results via DeduplicationJob.result(). The result includes kept dataset item IDs, kept reference IDs, and deduplication stats such as threshold, original item count, and deduplicated item count.

The SDK now also validates completed deduplication job responses and fails clearly if the server returns a malformed result payload, avoiding misleading fallback stats.