Releases: scaleapi/nucleus-python-client
Add async dataset deduplication support
Summary
This release adds async deduplication support for Nucleus datasets.
Users can now start a background deduplication job with Dataset.deduplicate() or Dataset.deduplicate_by_ids(), then collect structured results via DeduplicationJob.result(). The result includes kept dataset item IDs, kept reference IDs, and deduplication stats such as threshold, original item count, and deduplicated item count.
The SDK now also validates completed deduplication job responses and fails clearly if the server returns a malformed result payload, avoiding misleading fallback stats.
Enforce mutually exclusive auth connections
Changed
api_keyandlimited_access_keyare now mutually exclusive inNucleusClient. Passing both (or settingNUCLEUS_API_KEYwhile also passinglimited_access_key) raises aValueError.
Fixed
- Docstring improvements across
NucleusClient: fixed copy-paste errors (get_job,get_slice,delete_slice), removed phantomstats_onlyparameter fromlist_jobs, correctedmake_requestparameter name, and restructuredcreate_launch_model/create_launch_model_from_dirdocs for proper rendering. - Suppressed Sphinx warnings from inherited pydantic
BaseModelmethods by removinginherited-membersfrom autoapi options.
Fix Sphinx builds and prune deprecated packages Latest
Removed the deprecated pkg_resources package and replaced it with importlib-metadata. pkg_resources package was causing 2 main problems:
- Preventing the successful build of the new auto generated sdk docs
- Whenever the SDK threw an error back to the user, they also got this confusing error:
UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30.
Additionally, this update resolves ~79 errors/warnings in sphinx auto doc build errors.
v0.17.12
Add dataset-scoped image deduplication support.
Supports deduplication on image and video datasets for an entire dataset, select reference ids, or select dataset item ids.
Example usage:
dataset = client.get_dataset("ds_...")
# Deduplicate entire dataset
result = dataset.deduplicate(threshold=10)
# Deduplicate specific items by reference IDs
result = dataset.deduplicate(threshold=10, reference_ids=["ref_1", "ref_2", "ref_3"])
# Deduplicate by internal item IDs (more efficient if you have them)
result = dataset.deduplicate_by_ids(threshold=10, dataset_item_ids=["item_1", "item_2"])
# Access results
print(f"Threshold: {result.stats.threshold}")
print(f"Original: {result.stats.original_count}, Unique: {result.stats.deduplicated_count}")
print(result.unique_reference_ids)v0.17.11
Added support for limited access keys (to be used with or in substitute of api_keys for NucleusClient auth)
Example usage:
c = nucleus.NucleusClient(limited_access_key="<LIMITED_ACCESS_KEY>")
c = nucleus.NucleusClient(api_key="<API_KEY>", limited_access_key="<LIMITED_ACCESS_KEY>")
c = nucleus.NucleusClient(api_key="<API_KEY>")
v0.17.9
v0.17.5
Added
- Method for uploading lidar semantic segmentation predictions, via
dataset.upload_lidar_semseg_predictions
Example usage:
dataset = client.get_dataset("ds_...")
model = client.get_model("prj_...")
pointcloud_ref_id = 'pc_ref_1'
predictions_s3 = "s3://temp/predictions.json"
dataset.upload_lidar_semseg_predictions(model, pointcloud_ref_id, predictions_s3)v0.17.3
Added
- Added the environment variable
S3_ENDPOINTto accommodate for nonstandard S3 Endpoint URLs when asking for presigned URLs
v0.17.2
Modified
In Dataset.create_slice, the reference_ids parameter is now optional. If left unspecified, it will create an empty slice
v0.17.0
Added
- Added
dataset.add_items_from_dir - Added pytest-xdist for test parallelization
Fixes
- Fix test
test_models.test_remove_invalid_tag_from_model