Release Add async dataset deduplication support · scaleapi/nucleus-python-client

Summary

This release adds async deduplication support for Nucleus datasets.

Users can now start a background deduplication job with Dataset.deduplicate() or Dataset.deduplicate_by_ids(), then collect structured results via DeduplicationJob.result(). The result includes kept dataset item IDs, kept reference IDs, and deduplication stats such as threshold, original item count, and deduplicated item count.

The SDK now also validates completed deduplication job responses and fails clearly if the server returns a malformed result payload, avoiding misleading fallback stats.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add async dataset deduplication support

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Summary

Uh oh!